[ 
https://issues.apache.org/jira/browse/SOLR-16436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619017#comment-17619017
 ] 

Chris M. Hostetter commented on SOLR-16436:
-------------------------------------------

{quote}should we include this to 9.1 release?
{quote}
when i committed this and backported to 9x i was under the (mistaken?) 
impression that the RC process had already started for 9.1, and i didn't want 
to complicated it.

I don't think this is a particularly "high risk" issue to backport, but I also 
don't see it as an "servere impact" issue that needs to have a fix released 
urgently... so on balance I'd rather just stay out of the way and not risk 
complicating other more important backports currently in process.

If you, as the RM, feel like it's an easy piece of low hanging fruit and want 
to go ahead and backport it – go for it.

 

 

> DirectSolrSpellChecher: maxQueryFrequency bug in multi-shard 
> -------------------------------------------------------------
>
>                 Key: SOLR-16436
>                 URL: https://issues.apache.org/jira/browse/SOLR-16436
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: spellchecker
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>             Fix For: main (10.0), 9.2
>
>         Attachments: SOLR-16436-1.patch, SOLR-16436.patch
>
>
> {{DirectSolrSpellChecher}} has some very confusing/unexpected behavior when:
>  * {{maxQueryFrequency}} is configured
>  * In a multi-shard collection
>  * Using {{thresholdTokenFrequency}} or {{spellcheck.onlyMorePopular=true}} 
> or {{spellcheck.alternativeTermCount}}
>  ** (ie: anything that cause {{SuggestMode != SUGGEST_WHEN_NOT_IN_INDEX}} so 
> suggestions are possible even for terms in the index)
> The nature of the unexpected behavior varies depending on whether 
> {{maxQueryFrequency}} is configured as a float less then 1 (ie: a percentage 
> relative to the maxDocs in the index) or an integer greater then 1 (ie: an 
> absolute max frequency):
>  * When {{maxQueryFrequency < 1}} (ie: "percentage of maxDocs")
>  ** It's possible to get "false negative" suggestions
>  *** ie: a term that _should_ generate suggestions (and would in an 
> equivalent single-shard deployment) *does not*
>  ** A term from the original query may not exist in enough total documents 
> then the configured {{maxQueryFrequency}} percentage across the entire 
> collection, but will not return suggestions
>  ** This can happen if a term exists in more then the configured 
> {{maxQueryFrequency}} percentage of docs on _one (or more)_ individual shards
>  *** As long as at least one shard says the term is "correctly spelled" 
> (which is what {{DirectSolrSpellChecher}} decides when the 
> {{maxQueryFrequency}} threshold is met) then the merge logic ignores any 
> suggestions that might come from other shards
>  * When {{1 < maxQueryFrequency}} (ie "absolute value")
>  ** It's possible to get "false positive" suggestions
>  *** ie: a term that _should not_ generate suggestions (and would not in an 
> equivalent single-shard deployment) *does*
>  ** A term from the original query may exist in more total documents in the 
> collection then the configured {{maxQueryFrequency}} but will still return 
> suggestions
>  ** This can happen if a term exists in fewer then the configured 
> {{maxQueryFrequency}} number of docs on _every_ individual shard
>  *** Since no shard says the term is "correctly spelled", the suggestions are 
> merged and returned
>  *** No aspect of the code considers the possibility that the sum of the 
> {{origFreq}} returned by all shards might be higher then the specified 
> {{maxQueryFrequency}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to