[
https://issues.apache.org/jira/browse/LUCENE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346900#comment-14346900
]
Jan Høydahl commented on LUCENE-6336:
-------------------------------------
bq. Not a bug: DocumentDictionary etc suggests documents, not terms.
What you say implies that the field you use for the suggested terms must be
100% unique across the main document index. Suggesters are typically used to
suggest e.g. authors, languages, categories... Indeed, the example in
https://cwiki.apache.org/confluence/display/solr/Suggester does just this,
suggesting categories using price field as weight, and {{DocumentDictionary}}.
But FuzzySuggester is not index-based so it does not reveal this bug.
> AnalyzingInfixSuggester needs duplicate handling
> ------------------------------------------------
>
> Key: LUCENE-6336
> URL: https://issues.apache.org/jira/browse/LUCENE-6336
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 4.10.3, 5.0
> Reporter: Jan Høydahl
> Fix For: Trunk, 5.1
>
> Attachments: LUCENE-6336.patch
>
>
> Spinoff from LUCENE-5833 but else unrelated.
> Using {{AnalyzingInfixSuggester}} which is backed by a Lucene index and
> stores payload and score together with the suggest text.
> I did some testing with Solr, producing the DocumentDictionary from an index
> with multiple documents containing the same text, but with random weights
> between 0-100. Then I got duplicate identical suggestions sorted by weight:
> {code}
> {
> "suggest":{"languages":{
> "engl":{
> "numFound":101,
> "suggestions":[{
> "term":"<b>Engl</b>ish",
> "weight":100,
> "payload":"0"},
> {
> "term":"<b>Engl</b>ish",
> "weight":99,
> "payload":"0"},
> {
> "term":"<b>Engl</b>ish",
> "weight":98,
> "payload":"0"},
> ---etc all the way down to 0---
> {code}
> I also reproduced the same behavior in AnalyzingInfixSuggester directly. So
> there is a need for some duplicate removal here, either while building the
> local suggest index or during lookup. Only the highest weight suggestion for
> a given term should be returned.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]