[
https://issues.apache.org/jira/browse/SOLR-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617697#comment-17617697
]
Ishan Chattopadhyaya commented on SOLR-16436:
---------------------------------------------
[~hossman], should we include this to 9.1 release?
> DirectSolrSpellChecher: maxQueryFrequency bug in multi-shard
> -------------------------------------------------------------
>
> Key: SOLR-16436
> URL: https://issues.apache.org/jira/browse/SOLR-16436
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: spellchecker
> Reporter: Chris M. Hostetter
> Assignee: Chris M. Hostetter
> Priority: Major
> Fix For: main (10.0), 9.2
>
> Attachments: SOLR-16436-1.patch, SOLR-16436.patch
>
>
> {{DirectSolrSpellChecher}} has some very confusing/unexpected behavior when:
> * {{maxQueryFrequency}} is configured
> * In a multi-shard collection
> * Using {{thresholdTokenFrequency}} or {{spellcheck.onlyMorePopular=true}}
> or {{spellcheck.alternativeTermCount}}
> ** (ie: anything that cause {{SuggestMode != SUGGEST_WHEN_NOT_IN_INDEX}} so
> suggestions are possible even for terms in the index)
> The nature of the unexpected behavior varies depending on whether
> {{maxQueryFrequency}} is configured as a float less then 1 (ie: a percentage
> relative to the maxDocs in the index) or an integer greater then 1 (ie: an
> absolute max frequency):
> * When {{maxQueryFrequency < 1}} (ie: "percentage of maxDocs")
> ** It's possible to get "false negative" suggestions
> *** ie: a term that _should_ generate suggestions (and would in an
> equivalent single-shard deployment) *does not*
> ** A term from the original query may not exist in enough total documents
> then the configured {{maxQueryFrequency}} percentage across the entire
> collection, but will not return suggestions
> ** This can happen if a term exists in more then the configured
> {{maxQueryFrequency}} percentage of docs on _one (or more)_ individual shards
> *** As long as at least one shard says the term is "correctly spelled"
> (which is what {{DirectSolrSpellChecher}} decides when the
> {{maxQueryFrequency}} threshold is met) then the merge logic ignores any
> suggestions that might come from other shards
> * When {{1 < maxQueryFrequency}} (ie "absolute value")
> ** It's possible to get "false positive" suggestions
> *** ie: a term that _should not_ generate suggestions (and would not in an
> equivalent single-shard deployment) *does*
> ** A term from the original query may exist in more total documents in the
> collection then the configured {{maxQueryFrequency}} but will still return
> suggestions
> ** This can happen if a term exists in fewer then the configured
> {{maxQueryFrequency}} number of docs on _every_ individual shard
> *** Since no shard says the term is "correctly spelled", the suggestions are
> merged and returned
> *** No aspect of the code considers the possibility that the sum of the
> {{origFreq}} returned by all shards might be higher then the specified
> {{maxQueryFrequency}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]