[
https://issues.apache.org/jira/browse/LUCENE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676877#comment-16676877
]
Samuel Solís commented on LUCENE-6336:
--------------------------------------
Hi,
I'm a new Solr user and this is my first comment in a issue. Sorry if my
knowledge is not the best to report an issue.
I'm created a suggest system like the described in the issue and the problem is
exactly the same. I have configured a BlendedInfixLookupFactory with a
multivalue field and
DocumentExpressionDictionaryFactory as a dictionaryImpl. The problem is that
the suggestions contain duplicates if the weight are different and it's a bad
behavior I think. The idea of remove duplicates using params like "_unique=true
and weightCalculus =max|min|avg_" seems nice.
I know that the issue is for a 5.0 version but I'm using 6.6 and it's still
active and the problem is not resolved yet. how can I help? I'm not a Java
developer (I'm developer but I don't use Java) but I can test something if you
want or create tests or something. Or if somebody know a better solution just
to discuss it.
Thanks!
> AnalyzingInfixSuggester needs duplicate handling
> ------------------------------------------------
>
> Key: LUCENE-6336
> URL: https://issues.apache.org/jira/browse/LUCENE-6336
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 4.10.3, 5.0
> Reporter: Jan Høydahl
> Assignee: Jan Høydahl
> Priority: Major
> Labels: lookup, suggester
> Attachments: LUCENE-6336.patch
>
>
> Spinoff from LUCENE-5833 but else unrelated.
> Using {{AnalyzingInfixSuggester}} which is backed by a Lucene index and
> stores payload and score together with the suggest text.
> I did some testing with Solr, producing the DocumentDictionary from an index
> with multiple documents containing the same text, but with random weights
> between 0-100. Then I got duplicate identical suggestions sorted by weight:
> {code}
> {
> "suggest":{"languages":{
> "engl":{
> "numFound":101,
> "suggestions":[{
> "term":"<b>Engl</b>ish",
> "weight":100,
> "payload":"0"},
> {
> "term":"<b>Engl</b>ish",
> "weight":99,
> "payload":"0"},
> {
> "term":"<b>Engl</b>ish",
> "weight":98,
> "payload":"0"},
> ---etc all the way down to 0---
> {code}
> I also reproduced the same behavior in AnalyzingInfixSuggester directly. So
> there is a need for some duplicate removal here, either while building the
> local suggest index or during lookup. Only the highest weight suggestion for
> a given term should be returned.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]