[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869323#comment-13869323
 ] 

Naomi Dushay commented on SOLR-2649:
------------------------------------

I believe the changes Andrew is suggesting sound good.  I recently make careful 
improvements to our CJK Resource discovery (I'm in the midst of blogging about 
it), and in combing through our logs of the last few days, I pulled out a few 
actual use cases where we have CJK characters and "OR":

鈴木重雄 OR 日本精神生成史論
毛澤東 OR 基礎戰
日報 OR 濟南
飄 OR 上海

there are others.  Note that we have no actual cases of CJK + non-CJK 
characters and 'OR'.

In my relevancy tests for CJK (supplied by East Asian language librarians), I 
didn't find many useful examples to exercise the case above.   I could try to 
apply a patch locally and check how it affects our ~1000 relevancy tests, but 
we are currently running Solr 4.4.  It would be much more tractable if there is 
a Solr 4.x  patch available for testing.

Here is the only realistic examples I could find from our test code:
スポーツ OR supotsu
  both clauses translate to "sports" (from Japanese)

So from my perspective, the cjk test is a corner case, and I think Andrew's 
approach sounds great.  Tom Burton-West and I are partly behind Robert Muir's 
fix, so getting Tom BW to weigh in would be great.


> MM ignored in edismax queries with operators
> --------------------------------------------
>
>                 Key: SOLR-2649
>                 URL: https://issues.apache.org/jira/browse/SOLR-2649
>             Project: Solr
>          Issue Type: Bug
>          Components: query parsers
>            Reporter: Magnus Bergmark
>            Priority: Minor
>             Fix For: 4.7
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to