[
https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wettin updated LUCENE-626:
-------------------------------
Comment: was deleted
> Extended spell checker with phrase support and adaptive user session analysis.
> ------------------------------------------------------------------------------
>
> Key: LUCENE-626
> URL: https://issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Karl Wettin
> Assigned To: Karl Wettin
> Priority: Minor
> Attachments: spellchecker.diff
>
>
> Some minor changes to how the single token ngram spell checker in
> contrib/spellcheck, but nothing that breaks any old implementation I think.
> Also fixed the broken test.
> NgramPhraseSuggestier tokenizes a query and suggests combinations of the
> single token suggestions matrix.
> They must match as a query against an apriori index. By using a span near
> query (default) you get features like this:
> assertEquals("lost in translation", ngramSuggester.didYouMean("lost on
> translation"));
> If term position vectors are available it is possible to make it context
> sensitive (or what one may call it) to suggest a new term order.
> assertEquals("heroes might magic", ngramSuggester.didYouMean("magic light
> heros"));
> assertEquals("heroes of might and magic",
> ngramSuggester.didYouMean("heros on light and magik"));
> assertEquals("best game made", ngramSuggester.didYouMean("game best
> made"));
> assertEquals("game made", ngramSuggester.didYouMean("made game"));
> assertEquals("game made", ngramSuggester.didYouMean("made lame"));
> assertEquals("the game", ngramSuggester.didYouMean("the game"));
> assertEquals("in the fame", ngramSuggester.didYouMean("in the game"));
> assertEquals("game", ngramSuggester.didYouMean("same"));
> assertEquals(0, ngramSuggester.suggest("may game").size());
> SessionAnalyzedDictionary is the adaptive layer, that learns from how users
> changed their queries, what data they inspected, et c. It will automagically
> find and suggest synonyms, decomposed words, and probably a lot of other neat
> features I still have not detected.
> A bit depending on the situation, ignored suggestions get suppresed and
> followed suggestions get suggeted even more.
> assertEquals("the da vinci code",
> dictionary.didYouMean("thedavincicode"));
> assertEquals("the da vinci code", dictionary.didYouMean("the davinci
> code"));
> assertEquals("homm", dictionary.didYouMean("heroes of might and magic"));
> assertEquals("heroes of might and magic", dictionary.didYouMean("homm"));
> assertEquals("heroes of might and magic 2", dictionary.didYouMean("heroes
> of might and magic ii"));
> assertEquals("heroes of might and magic ii",
> dictionary.didYouMean("heroes of might and magic 2"));
> The adaptive layer is not yet(tm) persistent, but soft referenced so that the
> dictionary don't go eat up all your RAM.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]