[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-3233: -------------------------------- Attachment: LUCENE-3233.patch patch with a first random test, this one currently does 10 iterations where it adds random shit to the synonym map, then it analyzes 10k random strings (each time capturing the output, and replaying it back to ensure the thing is deterministic and doesn't have reuse bugs). i also added the ignoreCase support. the filter might have a reuse bug, see ant test -Dtestcase=TestFSTSynonymMapFilter -Dtestmethod=testRandom -Dtests.seed=-4122723628721952592:244824441557739968 > HuperDuperSynonymsFilterâ„¢ > ------------------------- > > Key: LUCENE-3233 > URL: https://issues.apache.org/jira/browse/LUCENE-3233 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Robert Muir > Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, > LUCENE-3233.patch > > > The current synonymsfilter uses a lot of ram and cpu, especially at build > time. > I think yesterday I heard about "huge synonyms files" three times. > So, I think we should use an FST-based structure, sharing the inputs and > outputs. > And we should be more efficient with the tokenStream api, e.g. using > save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org