Jan Høydahl created LUCENE-6336:
-----------------------------------
Summary: AnalyzingInfixSuggester needs duplicate handling
Key: LUCENE-6336
URL: https://issues.apache.org/jira/browse/LUCENE-6336
Project: Lucene - Core
Issue Type: Bug
Affects Versions: 5.0, 4.10.3
Reporter: Jan Høydahl
Fix For: Trunk, 5.1
Spinoff from LUCENE-5833 but else unrelated.
Using {{AnalyzingInfixSuggester}} which is backed by a Lucene index and stores
payload and score together with the suggest text.
I did some testing with Solr, producing the DocumentDictionary from an index
with multiple documents containing the same text, but with random weights
between 0-100. Then I got duplicate identical suggestions sorted by weight:
{code}
{
"suggest":{"languages":{
"engl":{
"numFound":101,
"suggestions":[{
"term":"<b>Engl</b>ish",
"weight":100,
"payload":"0"},
{
"term":"<b>Engl</b>ish",
"weight":99,
"payload":"0"},
{
"term":"<b>Engl</b>ish",
"weight":98,
"payload":"0"},
---etc all the way down to 0---
{code}
I also reproduced the same behavior in AnalyzingInfixSuggester directly. So
there is a need for some duplicate removal here, either while building the
local suggest index or during lookup. Only the highest weight suggestion for a
given term should be returned.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]