[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™

Robert Muir (JIRA) Tue, 05 Jul 2011 06:26:46 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated LUCENE-3233:
--------------------------------

    Attachment: LUCENE-3233.patch

patch with a first random test, this one currently does 10 iterations where it 
adds random shit to the synonym map, then it analyzes 10k random strings (each 
time capturing the output, and replaying it back to ensure the thing is 
deterministic and doesn't have reuse bugs).

i also added the ignoreCase support.

the filter might have a reuse bug, see ant test 
-Dtestcase=TestFSTSynonymMapFilter -Dtestmethod=testRandom 
-Dtests.seed=-4122723628721952592:244824441557739968


> HuperDuperSynonymsFilter™
> -------------------------
>
>                 Key: LUCENE-3233
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3233
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
> LUCENE-3233.patch
>
>
> The current synonymsfilter uses a lot of ram and cpu, especially at build 
> time.
> I think yesterday I heard about "huge synonyms files" three times.
> So, I think we should use an FST-based structure, sharing the inputs and 
> outputs.
> And we should be more efficient with the tokenStream api, e.g. using 
> save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™

Reply via email to