[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016805#comment-13016805 ]
Dawid Weiss commented on SOLR-2378: ----------------------------------- Well spotted, Robert -- indeed, three-byte codepoints were throwing automaton exceptions. I've added a test for this. I also added "exact match promotion" to the top of the suggestions list, regardless of the score of the exact match. This is controlled by a final flag at the moment... maybe it should become a parameter, I don't know. > FST-based Lookup (suggestions) for prefix matches. > -------------------------------------------------- > > Key: SOLR-2378 > URL: https://issues.apache.org/jira/browse/SOLR-2378 > Project: Solr > Issue Type: New Feature > Components: spellchecker > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Labels: lookup, prefix > Fix For: 4.0 > > Attachments: SOLR-2378.patch > > > Implement a subclass of Lookup based on finite state automata/ transducers > (Lucene FST package). This issue is for implementing a relatively basic > prefix matcher, we will handle infixes and other types of input matches > gradually. Impl. phases: > - -write a DFA based suggester effectively identical to ternary tree based > solution right now,- > - -baseline benchmark against tern. tree (memory consumption, rebuilding > speed, indexing speed; reuse Andrzej's benchmark code)- > - -modify DFA to encode term weights directly in the automaton (optimize for > onlyMostPopular case)- > - -benchmark again- > - add infix suggestion support with prefix matches boosted higher (?) > - benchmark again > - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org