[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058475#comment-13058475
 ] 

Michael McCandless commented on SOLR-2628:
------------------------------------------

I think the reduction of RAM should be huge but lookup speed might be slower 
(ie the usual tradeoff of FST), since we are going char by char in the FST.  If 
we go word-by-word (ie FST's labels are word ords and we separately resolve 
word -> ord via "normal" hash lookup) then that might be a good middle 
ground... but this is all speculation for now!


> use of FST for SynonymsFilterFactory and synonyms.txt
> -----------------------------------------------------
>
>                 Key: SOLR-2628
>                 URL: https://issues.apache.org/jira/browse/SOLR-2628
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>    Affects Versions: 3.4, 4.0
>         Environment: Linux
>            Reporter: Bernd Fehling
>            Assignee: Dawid Weiss
>            Priority: Minor
>              Labels: suggestion
>
> Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
> This can generate huge maps because of the permutations for synonyms.
> Now where FST (finite state transducer) is introduced to lucene this could 
> also be used for synonyms.
> A tool can compile the synoynms.txt file to a binary automaton file which can 
> then be used
> with SynoynmsFilterFactory.
> Advantage:
> - faster start of solr, no need to generate SynonymsMap
> - faster lookup
> - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to