[ 
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4845:
---------------------------------------

    Attachment: LUCENE-4845.patch

New patch, adding the boolean allTermsRequired to the protected
finishTerms method, and fixed the ForkLastToken test case to only join
the last two tokens when the term is the same.

I also pushed the latest patch to the Jira search
(http://jirasearch.mikemccandless.com), which uses
AnalyzingInfixSuggester for the auto-suggest, and it seems to be
working.

Here's the benchmark results:
{noformat}
 -- construction time
 FuzzySuggester  input: 50001, time[ms]: 266 [+- 9.15]
 AnalyzingSuggester input: 50001, time[ms]: 270 [+- 41.81]
 AnalyzingInfixSuggester input: 50001, time[ms]: 360 [+- 7.14]
 JaspellLookup   input: 50001, time[ms]: 22 [+- 4.23]
 TSTLookup       input: 50001, time[ms]: 75 [+- 1.48]
 FSTCompletionLookup input: 50001, time[ms]: 127 [+- 3.34]
 WFSTCompletionLookup input: 50001, time[ms]: 119 [+- 3.84]

 -- prefixes: 2-4, num: 7, onlyMorePopular: false
 FuzzySuggester  queries: 50001, time[ms]: 2130 [+- 12.05], ~kQPS: 23
 AnalyzingSuggester queries: 50001, time[ms]: 642 [+- 8.80], ~kQPS: 78
 AnalyzingInfixSuggester queries: 50001, time[ms]: 863 [+- 9.50], ~kQPS: 58
 JaspellLookup   queries: 50001, time[ms]: 131 [+- 3.91], ~kQPS: 381
 TSTLookup       queries: 50001, time[ms]: 467 [+- 0.96], ~kQPS: 107
 FSTCompletionLookup queries: 50001, time[ms]: 369 [+- 5.21], ~kQPS: 135
 WFSTCompletionLookup queries: 50001, time[ms]: 291 [+- 4.64], ~kQPS: 172

 -- prefixes: 6-9, num: 7, onlyMorePopular: false
 FuzzySuggester  queries: 50001, time[ms]: 3216 [+- 14.12], ~kQPS: 16
 AnalyzingSuggester queries: 50001, time[ms]: 275 [+- 4.10], ~kQPS: 182
 AnalyzingInfixSuggester queries: 50001, time[ms]: 656 [+- 10.20], ~kQPS: 76
 JaspellLookup   queries: 50001, time[ms]: 73 [+- 3.17], ~kQPS: 688
 TSTLookup       queries: 50001, time[ms]: 61 [+- 1.99], ~kQPS: 815
 FSTCompletionLookup queries: 50001, time[ms]: 273 [+- 2.45], ~kQPS: 183
 WFSTCompletionLookup queries: 50001, time[ms]: 86 [+- 3.49], ~kQPS: 579

 -- prefixes: 100-200, num: 7, onlyMorePopular: false
 FuzzySuggester  queries: 50001, time[ms]: 3572 [+- 14.58], ~kQPS: 14
 AnalyzingSuggester queries: 50001, time[ms]: 251 [+- 4.99], ~kQPS: 199
 AnalyzingInfixSuggester queries: 50001, time[ms]: 502 [+- 12.07], ~kQPS: 100
 JaspellLookup   queries: 50001, time[ms]: 57 [+- 3.38], ~kQPS: 873
 TSTLookup       queries: 50001, time[ms]: 27 [+- 1.74], ~kQPS: 1851
 FSTCompletionLookup queries: 50001, time[ms]: 254 [+- 1.47], ~kQPS: 197
 WFSTCompletionLookup queries: 50001, time[ms]: 62 [+- 3.34], ~kQPS: 807

 -- RAM consumption
 FuzzySuggester  size[B]:      765,461
 AnalyzingSuggester size[B]:      765,461
 AnalyzingInfixSuggester size[B]:    2,228,216
 JaspellLookup   size[B]:    9,815,144
 TSTLookup       size[B]:    9,459,256
 FSTCompletionLookup size[B]:      376,896
 WFSTCompletionLookup size[B]:      450,384
{noformat}

                
> Add AnalyzingInfixSuggester
> ---------------------------
>
>                 Key: LUCENE-4845
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4845
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spellchecker
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 4.4
>
>         Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, 
> LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch
>
>
> Our current suggester impls do prefix matching of the incoming text
> against all compiled suggestions, but in some cases it's useful to
> allow infix matching.  E.g, Netflix does infix suggestions in their
> search box.
> I did a straightforward impl, just using a normal Lucene index, and
> using PostingsHighlighter to highlight matching tokens in the
> suggestions.
> I think this likely only works well when your suggestions have a
> strong prior ranking (weight input to build), eg Netflix knows
> the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to