[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062386#comment-13062386 ]
Yonik Seeley commented on LUCENE-3233: -------------------------------------- bq. Now, from the testing above, it looks like we are faster when syns actually match; if no syns match the two are around the same speed. Oh cool! I was looking at "1692" for the SynonymsFilter and a drop from "~3000ms -> ~2000ms" for the FST version. I assumed Robert's last benchmark was building and not lookup (the 112527/22872). bq. Separately: shouldn't we not have any syns in the default text_en field type? I dunno... it's nice for both demonstration and testing (and it's in the current tutorial). > HuperDuperSynonymsFilterâ„¢ > ------------------------- > > Key: LUCENE-3233 > URL: https://issues.apache.org/jira/browse/LUCENE-3233 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Robert Muir > Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, > LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, > LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, > LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip > > > The current synonymsfilter uses a lot of ram and cpu, especially at build > time. > I think yesterday I heard about "huge synonyms files" three times. > So, I think we should use an FST-based structure, sharing the inputs and > outputs. > And we should be more efficient with the tokenStream api, e.g. using > save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org