[ 
https://issues.apache.org/jira/browse/SOLR-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198347#comment-13198347
 ] 

Jan Høydahl commented on SOLR-3085:
-----------------------------------

You're right that technically it's not marked as required, but in the context 
of this "feature" we're discussing, the reason why people get 0 hits is that 
mm=100%, counted from all (SHOULD) clauses. And that means effectively that 
alltags:the is required.

What James suggested, and what most people tricked by this "feature" expects, 
is that if "the" is a stopword for one of the qf fields, it becomes optional in 
some way.

So how can we get that end result? First we need a way to safely detect that 
we're in this scenario, perhaps by inspecting whether each DisMax clause 
contains a field query for every field listed in QF. If one or more is missing, 
we can assume that the query term is a stopword in one or more of the fields. 
Then, one way may be to subtract the MM count accordingly, so that in our case 
above, when we detect that the DisMax clause for "the" does not contain 
"title_en", we do mm=mm-1 which will give us an MM of 1 instead of 2 and we'll 
get hits. This is probably the easiest solution.

Another way would be to keep mm as is, and move the affected clause out of the 
BooleanQuery and add it as a BoostQuery instead?

This behavior should be parameter driven, e.g. {{&mm.sw=false}} reading 
"Minimum should match does not require Stop Words"
                
> Fix the dismax/edismax stopwords mm issue
> -----------------------------------------
>
>                 Key: SOLR-3085
>                 URL: https://issues.apache.org/jira/browse/SOLR-3085
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>            Reporter: Jan Høydahl
>              Labels: MinimumShouldMatch, dismax, stopwords
>             Fix For: 3.6, 4.0
>
>
> As discussed here http://search-lucene.com/m/Wr7iz1a95jx and here 
> http://search-lucene.com/m/Yne042qEyCq1 and here 
> http://search-lucene.com/m/RfAp82nSsla DisMax has an issue with stopwords if 
> not all fields used in QF have exactly same stopword lists.
> Typical solution is to not use stopwords or harmonize stopword lists across 
> all fields in your QF, or relax the MM to a lower percentag. Sometimes these 
> are not acceptable workarounds, and we should find a better solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to