[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015634#comment-13015634 ]
Dawid Weiss commented on SOLR-2378: ----------------------------------- The build time needs to sort the input again (and create it in the first place). Because Lookup API assumes suggestion keywords can come from a variety of sources there is no guarantee they will be sorted, so we need to sort them before we can build the automaton. Still, I think the numbers are acceptable... if you need on-line construction of these suggestions you'll pick TST (it can add new keywords to the structure dynamically); for a batch-load suggester you'd pick the FST one. It is also very likely that I overlooked something that could bring those numbers down, I'll create a clean patch tomorrow, so everything will be out there for improving. > FST-based Lookup (suggestions) for prefix matches. > -------------------------------------------------- > > Key: SOLR-2378 > URL: https://issues.apache.org/jira/browse/SOLR-2378 > Project: Solr > Issue Type: New Feature > Components: spellchecker > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Labels: lookup, prefix > Fix For: 4.0 > > > Implement a subclass of Lookup based on finite state automata/ transducers > (Lucene FST package). This issue is for implementing a relatively basic > prefix matcher, we will handle infixes and other types of input matches > gradually. Impl. phases: > - write a DFA based suggester effectively identical to ternary tree based > solution right now, > - baseline benchmark against tern. tree (memory consumption, rebuilding > speed, indexing speed; reuse Andrzej's benchmark code) > - modify DFA to encode term weights directly in the automaton (optimize for > onlyMostPopular case) > - benchmark again > - add infix suggestion support with prefix matches boosted higher (?) > - benchmark again > - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org