Jan Høydahl created LUCENE-6336:
-----------------------------------

             Summary: AnalyzingInfixSuggester needs duplicate handling
                 Key: LUCENE-6336
                 URL: https://issues.apache.org/jira/browse/LUCENE-6336
             Project: Lucene - Core
          Issue Type: Bug
    Affects Versions: 5.0, 4.10.3
            Reporter: Jan Høydahl
             Fix For: Trunk, 5.1


Spinoff from LUCENE-5833 but else unrelated.

Using {{AnalyzingInfixSuggester}} which is backed by a Lucene index and stores 
payload and score together with the suggest text.

I did some testing with Solr, producing the DocumentDictionary from an index 
with multiple documents containing the same text, but with random weights 
between 0-100. Then I got duplicate identical suggestions sorted by weight:
{code}
{
  "suggest":{"languages":{
      "engl":{
        "numFound":101,
        "suggestions":[{
            "term":"<b>Engl</b>ish",
            "weight":100,
            "payload":"0"},
          {
            "term":"<b>Engl</b>ish",
            "weight":99,
            "payload":"0"},
          {
            "term":"<b>Engl</b>ish",
            "weight":98,
            "payload":"0"},
---etc all the way down to 0---
{code}

I also reproduced the same behavior in AnalyzingInfixSuggester directly. So 
there is a need for some duplicate removal here, either while building the 
local suggest index or during lookup. Only the highest weight suggestion for a 
given term should be returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to