[ 
https://issues.apache.org/jira/browse/SOLR-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter resolved SOLR-16436.
---------------------------------------
    Fix Version/s: main (10.0)
                   9.2
       Resolution: Fixed

> DirectSolrSpellChecher: maxQueryFrequency bug in multi-shard 
> -------------------------------------------------------------
>
>                 Key: SOLR-16436
>                 URL: https://issues.apache.org/jira/browse/SOLR-16436
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: spellchecker
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>             Fix For: main (10.0), 9.2
>
>         Attachments: SOLR-16436-1.patch, SOLR-16436.patch
>
>
> {{DirectSolrSpellChecher}} has some very confusing/unexpected behavior when:
>  * {{maxQueryFrequency}} is configured
>  * In a multi-shard collection
>  * Using {{thresholdTokenFrequency}} or {{spellcheck.onlyMorePopular=true}} 
> or {{spellcheck.alternativeTermCount}}
>  ** (ie: anything that cause {{SuggestMode != SUGGEST_WHEN_NOT_IN_INDEX}} so 
> suggestions are possible even for terms in the index)
> The nature of the unexpected behavior varies depending on whether 
> {{maxQueryFrequency}} is configured as a float less then 1 (ie: a percentage 
> relative to the maxDocs in the index) or an integer greater then 1 (ie: an 
> absolute max frequency):
>  * When {{maxQueryFrequency < 1}} (ie: "percentage of maxDocs")
>  ** It's possible to get "false negative" suggestions
>  *** ie: a term that _should_ generate suggestions (and would in an 
> equivalent single-shard deployment) *does not*
>  ** A term from the original query may not exist in enough total documents 
> then the configured {{maxQueryFrequency}} percentage across the entire 
> collection, but will not return suggestions
>  ** This can happen if a term exists in more then the configured 
> {{maxQueryFrequency}} percentage of docs on _one (or more)_ individual shards
>  *** As long as at least one shard says the term is "correctly spelled" 
> (which is what {{DirectSolrSpellChecher}} decides when the 
> {{maxQueryFrequency}} threshold is met) then the merge logic ignores any 
> suggestions that might come from other shards
>  * When {{1 < maxQueryFrequency}} (ie "absolute value")
>  ** It's possible to get "false positive" suggestions
>  *** ie: a term that _should not_ generate suggestions (and would not in an 
> equivalent single-shard deployment) *does*
>  ** A term from the original query may exist in more total documents in the 
> collection then the configured {{maxQueryFrequency}} but will still return 
> suggestions
>  ** This can happen if a term exists in fewer then the configured 
> {{maxQueryFrequency}} number of docs on _every_ individual shard
>  *** Since no shard says the term is "correctly spelled", the suggestions are 
> merged and returned
>  *** No aspect of the code considers the possibility that the sum of the 
> {{origFreq}} returned by all shards might be higher then the specified 
> {{maxQueryFrequency}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to