[
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869267#comment-13869267
]
Andrew Buchanan commented on SOLR-2649:
---------------------------------------
I'm taking a look at fixing this one.
I've tracked this all the way through the code history and back through the old
solr repository. It looks like it was originally submitted this way by Yonik
Seeley as SOLR-1553. Any previous history that might explain the reasoning
would presumably be in Lucid Imaginations source control system (which I don't
have access to). The DisMax parser on which it was based simply used the MM
values as passed in, as has been previously noted.
Hoss Man refers to this behavior as a bug at
https://issues.apache.org/jira/browse/SOLR-1553?focusedCommentId=12871244&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12871244
on the original SOLR-1553.
If you force doMinMatched = true to disable this logic in
ExtendedDismaxQParser, everything seems to work as expected above with the
exception of one test case that fails
(TestExtendedDismaxParser.testCJKStructured). This test case was added as part
of r1406437 by Robert Muir for SOLR-3589 - Edismax parser does not honor mm
parameter if analyzer splits a token.
The last query in that test case is "大亚湾 OR bogus" with mm=100% which the test
is expecting to evaluate to "+((((standardtok:大 standardtok:亚
standardtok:湾)~3)) (standardtok:bogus))". The comment for the test from Robert
Muir indicates that it should "always apply minShouldMatch to the inner
booleanqueries created from whitespace, as these are never structured lucene
queries but only come from unstructured text". Looking at that query though, it
seems to me that it should instead evaluate to "+(((((standardtok:大
standardtok:亚 standardtok:湾)~3)) (standardtok:bogus))~2)", essentially applying
the MM to the top level clauses. I'm certainly not a CJK language expert
though, so there may be a subtlety here I'm unaware of.
I can put together a patch with some test cases to make this behave as folks
here seem to expect, but I would like to get some clarification from Robert if
possible on whether he agrees that the existing test case should change...
> MM ignored in edismax queries with operators
> --------------------------------------------
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
> Issue Type: Bug
> Components: query parsers
> Reporter: Magnus Bergmark
> Priority: Minor
> Fix For: 4.7
>
>
> Hypothetical scenario:
> 1. User searches for "stocks oil gold" with MM set to "50%"
> 2. User adds "-stockings" to the query: "stocks oil gold -stockings"
> 3. User gets no hits since MM was ignored and all terms where AND-ed
> together
> The behavior seems to be intentional, although the reason why is never
> explained:
> // For correct lucene queries, turn off mm processing if there
> // were explicit operators (except for AND).
> boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0;
> (lines 232-234 taken from
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the
> primary features of dismax.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]