[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

Andrew Buchanan (JIRA) Sun, 12 Jan 2014 19:34:55 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869267#comment-13869267
 ]


Andrew Buchanan commented on SOLR-2649:
---------------------------------------

I'm taking a look at fixing this one.

I've tracked this all the way through the code history and back through the old 
solr repository. It looks like it was originally submitted this way by Yonik 
Seeley as SOLR-1553. Any previous history that might explain the reasoning 
would presumably be in Lucid Imaginations source control system (which I don't 
have access to). The DisMax parser on which it was based simply used the MM 
values as passed in, as has been previously noted.

Hoss Man refers to this behavior as a bug at 
https://issues.apache.org/jira/browse/SOLR-1553?focusedCommentId=12871244&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12871244
 on the original SOLR-1553.

If you force doMinMatched = true to disable this logic in 
ExtendedDismaxQParser, everything seems to work as expected above with the 
exception of one test case that fails 
(TestExtendedDismaxParser.testCJKStructured). This test case was added as part 
of r1406437 by Robert Muir for SOLR-3589 - Edismax parser does not honor mm 
parameter if analyzer splits a token.

The last query in that test case is "大亚湾 OR bogus" with mm=100% which the test 
is expecting to evaluate to "+((((standardtok:大 standardtok:亚 
standardtok:湾)~3)) (standardtok:bogus))". The comment for the test from Robert 
Muir indicates that it should "always apply minShouldMatch to the inner 
booleanqueries created from whitespace, as these are never structured lucene 
queries but only come from unstructured text". Looking at that query though, it 
seems to me that it should instead evaluate to "+(((((standardtok:大 
standardtok:亚 standardtok:湾)~3)) (standardtok:bogus))~2)", essentially applying 
the MM to the top level clauses. I'm certainly not a CJK language expert 
though, so there may be a subtlety here I'm unaware of.

I can put together a patch with some test cases to make this behave as folks 
here seem to expect, but I would like to get some clarification from Robert if 
possible on whether he agrees that the existing test case should change...

> MM ignored in edismax queries with operators
> --------------------------------------------
>
>                 Key: SOLR-2649
>                 URL: https://issues.apache.org/jira/browse/SOLR-2649
>             Project: Solr
>          Issue Type: Bug
>          Components: query parsers
>            Reporter: Magnus Bergmark
>            Priority: Minor
>             Fix For: 4.7
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

Reply via email to