[
https://issues.apache.org/jira/browse/SOLR-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853940#comment-13853940
]
Jan Høydahl commented on SOLR-3085:
-----------------------------------
bq. Environments without stopwords still have a problem with mm. Consider your
q=A horse in a stable. With mm=2 we get all kinds of documents, usually all
documents in the corpus (in and a). Ideally this or another parameter would
only require horse and stable.
The mm.autoRelax param is designed to tackle one of the most common situation
where your qf includes a bunch of "text" fields with stopword removal plus one
or more "string" fields like "id" or "tags" etc. Take the example of {{qf=title
body tags}} where title and body removes stopwords but tags does not. This
would translate to something like
{code}
(DMQ(tags:a) DMQ(title:horse | body:horse | tags:horse) DMQ(tags:in)
DMQ(tags:a) DMQ(title:stable | body:stable | tags:stable))~5
{code}
Very often in these cases the "tags" field does not contain free-text, so
tags:a, tags:in would not match, and we always get 0 hits -- thus mm=2 would
help here.
But for cases where you query multiple english analyzed text fields with
different stopword lists, relaxation of mm is not the cure. The cure is rather
to add the same stopword handling to all those text fieldTypes.
Clearly mm.autoRelax is not a complete solution for all mm issues. For other
cases we may need other cures. One idea I thought of the other day is a param
{{mergeStopwords=true}}, which modifies the analysis chain for each field in
{{qf}} to include all StopFilters on the "query" analysis of each field. I.e.
if my field A has {{stopwords="a.txt"}} and field B has {{stopwords="b.txt"}},
then edismax would add those two stopword filters in a row for both fields,
much the same way that edismax removes the StopFilter when doing smart stopword
handling.
> Fix the dismax/edismax stopwords mm issue
> -----------------------------------------
>
> Key: SOLR-3085
> URL: https://issues.apache.org/jira/browse/SOLR-3085
> Project: Solr
> Issue Type: Bug
> Components: query parsers
> Reporter: Jan Høydahl
> Assignee: Jan Høydahl
> Labels: MinimumShouldMatch, dismax, edismax, stopwords
> Fix For: 5.0, 4.7
>
> Attachments: SOLR-3085.patch, SOLR-3085.patch, SOLR-3085.patch
>
>
> As discussed here http://search-lucene.com/m/Wr7iz1a95jx and here
> http://search-lucene.com/m/Yne042qEyCq1 and here
> http://search-lucene.com/m/RfAp82nSsla DisMax has an issue with stopwords if
> not all fields used in QF have exactly same stopword lists.
> Typical solution is to not use stopwords or harmonize stopword lists across
> all fields in your QF, or relax the MM to a lower percentag. Sometimes these
> are not acceptable workarounds, and we should find a better solution.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]