Brian created SOLR-6243:
---------------------------

             Summary: eDisMax hidden change - no longer applies disjunction max 
to "pf" query
                 Key: SOLR-6243
                 URL: https://issues.apache.org/jira/browse/SOLR-6243
             Project: Solr
          Issue Type: Bug
          Components: query parsers
    Affects Versions: 4.8.1
            Reporter: Brian


At some point after Solr 3.5 a bug was introduced into eDisMax (Extended DisMax 
Query parser) that is still there as of Solr 4.8.1.  The "pf" part of the query 
(full phrase query) no longer is applied as a disjunction max query - instead 
all the matching field scores are simply added to the total score.  I.e. they 
are just added together as opposed to the max being taken + tie-breaker times 
the sum of the other match scores.

This changes the scores and the rankings significantly.  When upgrading from 
Solr 3.5, one of our relevance test measures showed target results dropping 
over a full rank due to this bug.  On key result went from being at rank 7 to 
past rank 40.  I do not see any easy workaround for this.

The following is a comparison between query results for Solr 3.5 and Solr 4.8, 
for one query, showing the "pf" parts of the query and scores.

Turning debug query on, the results are the following,  They clearly show that 
that max is used with the tiebreaker in 3.5 but not 4.8 for pf: 

query (3.5): 
boost(+(((inlink_text:edg^1.2 | body:edg^0.5 | title:edg^1.2 | 
meta_description:edg^0.5 | url_path:edg^1.2 | file_name:edg^1.2 | 
primary_header:edg^1.2 | secondary_header:edg^0.5)~0.17 (inlink_text:detect^1.2 
| body:detect^0.5 | title:detect^1.2 | meta_description:detect^0.5 | 
url_path:detect^1.2 | file_name:detect^1.2 | primary_header:detect^1.2 | 
secondary_header:detect^0.5)~0.17)~2) (inlink_text:"edg detect"~100^1.2 | 
body:"edg detect"~100^0.5 | title:"edg detect"~100^1.2 | meta_description:"edg 
detect"~100^0.5 | url_path:"edg detect"~100^1.2 | file_name:"edg 
detect"~100^1.2 | primary_header:"edg detect"~100^1.2 | secondary_header:"edg 
detect"~100^0.5)~0.17,product(float(hier_score),pow(float(link_score),const(0.25))))
 

I.e., the "pf" part of the query has the following disjunction max form:
(inlink_text:"edg detect"~100^1.2 | body:"edg detect"~100^0.5 | ... | 
secondary_header:"edg detect"~100^0.5)~0.17

pf results for one (3.5): 
<lst>
<bool name="match">true</bool>
<float name="value">1.5689207</float>
<str name="description">max plus 0.17 times others of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">1.5596248</float>
<str name="description">...</str>
<arr name="details">...</arr>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">0.054681662</float>
<str name="description">...</str>
<arr name="details">...</arr>
</lst>
</arr>


However, in 4.8, "max" and the tie-breaker are nowhere to be seen for the pf 
part of the query: 
query (4.8): 
boost(+(((inlink_text:edg^1.2 | body:edg^0.5 | title:edg^1.2 | 
meta_description:edg^0.5 | url_path:edg^1.2 | file_name:edg^1.2 | 
primary_header:edg^1.2 | secondary_header:edg^0.5)~0.17 (inlink_text:detect^1.2 
| body:detect^0.5 | title:detect^1.2 | meta_description:detect^0.5 | 
url_path:detect^1.2 | file_name:detect^1.2 | primary_header:detect^1.2 | 
secondary_header:detect^0.5)~0.17)~2) body:"edg detect"~100^0.5 title:"edg 
detect"~100^1.2 url_path:"edg detect"~100^1.2 file_name:"edg detect"~100^1.2 
primary_header:"edg detect"~100^1.2 secondary_header:"edg detect"~100^0.5 
meta_description:"edg detect"~100^0.5 inlink_text:"edg 
detect"~100^1.2,product(float(hier_score),pow(float(link_score),const(0.25)))) 

I.e., the "pf" part of the query does NOT have the disjunction max form:
body:"edg detect"~100^0.5 title:"edg detect"~100^1.2 ... inlink_text:"edg 
detect"~100^1.2,

pf results for one (4.8) (no max, both values are just listed under the "sum 
of" element: 
<lst>
<bool name="match">true</bool>
<float name="value">0.03554287</float>
<str name="description">...</str>
<arr name="details">...</arr>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">1.0933692</float>
<str name="description">...</str>
<arr name="details">...</arr>
</lst>



The Solr 4 handler used is the following - it's also the same as the 3.5 one: 
 <requestHandler class="solr.SearchHandler" name="/sitewide">
    
     <lst name="defaults">
       <str name="defType">edismax</str>
       <str name="echoParams">explicit</str>
        <float name="tie">0.17</float>
         <str name="qf">
           body^0.5 title^1.2 url_path^1.2 file_name^1.2 primary_header^1.2 
secondary_header^0.5 meta_description^0.5 inlink_text^1.2 
         </str>
         <str name="pf">
           body^0.5 title^1.2 url_path^1.2 file_name^1.2 primary_header^1.2 
secondary_header^0.5 meta_description^0.5 inlink_text^1.2 
         </str>
         <int name="ps">100</int>
     <str name="boost">
       hier_score 
     </str>
     <str name="boost">
       pow(link_score,0.25) 
     </str>
     </lst>
     <lst name="spellchecker">
      
      <str name="spellcheck.onlyMorePopular">false</str>
      
      <str name="spellcheck.extendedResults">true</str>
      
      <str name="spellcheck.count">3</str>
      <str name="buildOnCommit">true</str>
     </lst>
     <arr name="last-components">
       <str>spellcheck</str>
     </arr>
  </requestHandler>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to