[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™

Michael McCandless (JIRA) Wed, 06 Jul 2011 06:47:39 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060578#comment-13060578
 ]


Michael McCandless commented on LUCENE-3233:
--------------------------------------------

Actually, maybe a better general fix for FST would be for it to dynamically 
decide whether to make an array based on how many bytes will be wasted (in 
addition to the number of arcs / depth of the node).  This way we could turn on 
arcs always, and FST would pick the right times to use it.  If we stick to only 
1 byte for the number of bytes per arc, the FST could simply not use the array 
when an arc is > 256 bytes.

> HuperDuperSynonymsFilter™
> -------------------------
>
>                 Key: LUCENE-3233
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3233
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
> LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
> LUCENE-3233.patch, synonyms.zip
>
>
> The current synonymsfilter uses a lot of ram and cpu, especially at build 
> time.
> I think yesterday I heard about "huge synonyms files" three times.
> So, I think we should use an FST-based structure, sharing the inputs and 
> outputs.
> And we should be more efficient with the tokenStream api, e.g. using 
> save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™

Reply via email to