[
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-4845:
---------------------------------------
Attachment: LUCENE-4845.patch
New patch, adding the boolean allTermsRequired to the protected
finishTerms method, and fixed the ForkLastToken test case to only join
the last two tokens when the term is the same.
I also pushed the latest patch to the Jira search
(http://jirasearch.mikemccandless.com), which uses
AnalyzingInfixSuggester for the auto-suggest, and it seems to be
working.
Here's the benchmark results:
{noformat}
-- construction time
FuzzySuggester input: 50001, time[ms]: 266 [+- 9.15]
AnalyzingSuggester input: 50001, time[ms]: 270 [+- 41.81]
AnalyzingInfixSuggester input: 50001, time[ms]: 360 [+- 7.14]
JaspellLookup input: 50001, time[ms]: 22 [+- 4.23]
TSTLookup input: 50001, time[ms]: 75 [+- 1.48]
FSTCompletionLookup input: 50001, time[ms]: 127 [+- 3.34]
WFSTCompletionLookup input: 50001, time[ms]: 119 [+- 3.84]
-- prefixes: 2-4, num: 7, onlyMorePopular: false
FuzzySuggester queries: 50001, time[ms]: 2130 [+- 12.05], ~kQPS: 23
AnalyzingSuggester queries: 50001, time[ms]: 642 [+- 8.80], ~kQPS: 78
AnalyzingInfixSuggester queries: 50001, time[ms]: 863 [+- 9.50], ~kQPS: 58
JaspellLookup queries: 50001, time[ms]: 131 [+- 3.91], ~kQPS: 381
TSTLookup queries: 50001, time[ms]: 467 [+- 0.96], ~kQPS: 107
FSTCompletionLookup queries: 50001, time[ms]: 369 [+- 5.21], ~kQPS: 135
WFSTCompletionLookup queries: 50001, time[ms]: 291 [+- 4.64], ~kQPS: 172
-- prefixes: 6-9, num: 7, onlyMorePopular: false
FuzzySuggester queries: 50001, time[ms]: 3216 [+- 14.12], ~kQPS: 16
AnalyzingSuggester queries: 50001, time[ms]: 275 [+- 4.10], ~kQPS: 182
AnalyzingInfixSuggester queries: 50001, time[ms]: 656 [+- 10.20], ~kQPS: 76
JaspellLookup queries: 50001, time[ms]: 73 [+- 3.17], ~kQPS: 688
TSTLookup queries: 50001, time[ms]: 61 [+- 1.99], ~kQPS: 815
FSTCompletionLookup queries: 50001, time[ms]: 273 [+- 2.45], ~kQPS: 183
WFSTCompletionLookup queries: 50001, time[ms]: 86 [+- 3.49], ~kQPS: 579
-- prefixes: 100-200, num: 7, onlyMorePopular: false
FuzzySuggester queries: 50001, time[ms]: 3572 [+- 14.58], ~kQPS: 14
AnalyzingSuggester queries: 50001, time[ms]: 251 [+- 4.99], ~kQPS: 199
AnalyzingInfixSuggester queries: 50001, time[ms]: 502 [+- 12.07], ~kQPS: 100
JaspellLookup queries: 50001, time[ms]: 57 [+- 3.38], ~kQPS: 873
TSTLookup queries: 50001, time[ms]: 27 [+- 1.74], ~kQPS: 1851
FSTCompletionLookup queries: 50001, time[ms]: 254 [+- 1.47], ~kQPS: 197
WFSTCompletionLookup queries: 50001, time[ms]: 62 [+- 3.34], ~kQPS: 807
-- RAM consumption
FuzzySuggester size[B]: 765,461
AnalyzingSuggester size[B]: 765,461
AnalyzingInfixSuggester size[B]: 2,228,216
JaspellLookup size[B]: 9,815,144
TSTLookup size[B]: 9,459,256
FSTCompletionLookup size[B]: 376,896
WFSTCompletionLookup size[B]: 450,384
{noformat}
> Add AnalyzingInfixSuggester
> ---------------------------
>
> Key: LUCENE-4845
> URL: https://issues.apache.org/jira/browse/LUCENE-4845
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/spellchecker
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 5.0, 4.4
>
> Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch,
> LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch
>
>
> Our current suggester impls do prefix matching of the incoming text
> against all compiled suggestions, but in some cases it's useful to
> allow infix matching. E.g, Netflix does infix suggestions in their
> search box.
> I did a straightforward impl, just using a normal Lucene index, and
> using PostingsHighlighter to highlight matching tokens in the
> suggestions.
> I think this likely only works well when your suggestions have a
> strong prior ranking (weight input to build), eg Netflix knows
> the popularity of movies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]