[jira] [Commented] (LUCENE-6336) AnalyzingInfixSuggester needs duplicate handling

JIRA Tue, 06 Nov 2018 07:16:27 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676877#comment-16676877
 ]


Samuel Solís commented on LUCENE-6336:
--------------------------------------

Hi,

I'm a new Solr user and this is my first comment in a issue. Sorry if my 
knowledge is not the best to report an issue.

I'm created a suggest system like the described in the issue and the problem is 
exactly the same. I have configured a BlendedInfixLookupFactory with a 
multivalue field and 

DocumentExpressionDictionaryFactory as a dictionaryImpl. The problem is that 
the suggestions contain duplicates if the weight are different and it's a bad 
behavior I think. The idea of remove duplicates using params like "_unique=true 
and weightCalculus =max|min|avg_" seems nice.

I know that the issue is for a 5.0 version but I'm using 6.6 and it's still 
active and the problem is not resolved yet. how can I help? I'm not a Java 
developer (I'm developer but I don't use Java) but I can test something if you 
want or create tests or something. Or if somebody know a better solution just 
to discuss it.

 

Thanks!

 

 

> AnalyzingInfixSuggester needs duplicate handling
> ------------------------------------------------
>
>                 Key: LUCENE-6336
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6336
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.10.3, 5.0
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>            Priority: Major
>              Labels: lookup, suggester
>         Attachments: LUCENE-6336.patch
>
>
> Spinoff from LUCENE-5833 but else unrelated.
> Using {{AnalyzingInfixSuggester}} which is backed by a Lucene index and 
> stores payload and score together with the suggest text.
> I did some testing with Solr, producing the DocumentDictionary from an index 
> with multiple documents containing the same text, but with random weights 
> between 0-100. Then I got duplicate identical suggestions sorted by weight:
> {code}
> {
>   "suggest":{"languages":{
>       "engl":{
>         "numFound":101,
>         "suggestions":[{
>             "term":"<b>Engl</b>ish",
>             "weight":100,
>             "payload":"0"},
>           {
>             "term":"<b>Engl</b>ish",
>             "weight":99,
>             "payload":"0"},
>           {
>             "term":"<b>Engl</b>ish",
>             "weight":98,
>             "payload":"0"},
> ---etc all the way down to 0---
> {code}
> I also reproduced the same behavior in AnalyzingInfixSuggester directly. So 
> there is a need for some duplicate removal here, either while building the 
> local suggest index or during lookup. Only the highest weight suggestion for 
> a given term should be returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6336) AnalyzingInfixSuggester needs duplicate handling

Reply via email to