[ 
https://issues.apache.org/jira/browse/SOLR-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian resolved SOLR-6243.
-------------------------

    Resolution: Fixed

Started fixing this when I saw the latest 4.X source was already changed to 
resolve the issue,

> eDisMax hidden change - no longer applies disjunction max to "pf" query
> -----------------------------------------------------------------------
>
>                 Key: SOLR-6243
>                 URL: https://issues.apache.org/jira/browse/SOLR-6243
>             Project: Solr
>          Issue Type: Bug
>          Components: query parsers
>    Affects Versions: 4.8.1
>            Reporter: Brian
>              Labels: edismax, extendedDisMax, pf, phrase
>
> At some point after Solr 3.5 a bug was introduced into eDisMax (Extended 
> DisMax Query parser) that is still there as of Solr 4.8.1.  The "pf" part of 
> the query (full phrase query) no longer is applied as a disjunction max query 
> - instead all the matching field scores are simply added to the total score.  
> I.e. they are just added together as opposed to the max being taken + 
> tie-breaker times the sum of the other match scores.
> This changes the scores and the rankings significantly.  When upgrading from 
> Solr 3.5, one of our relevance test measures showed target results dropping 
> over a full rank due to this bug.  On key result went from being at rank 7 to 
> past rank 40.  I do not see any easy workaround for this.
> The following is a comparison between query results for Solr 3.5 and Solr 
> 4.8, for one query, showing the "pf" parts of the query and scores.
> Turning debug query on, the results are the following,  They clearly show 
> that that max is used with the tiebreaker in 3.5 but not 4.8 for pf: 
> query (3.5): 
> boost(+(((inlink_text:edg^1.2 | body:edg^0.5 | title:edg^1.2 | 
> meta_description:edg^0.5 | url_path:edg^1.2 | file_name:edg^1.2 | 
> primary_header:edg^1.2 | secondary_header:edg^0.5)~0.17 
> (inlink_text:detect^1.2 | body:detect^0.5 | title:detect^1.2 | 
> meta_description:detect^0.5 | url_path:detect^1.2 | file_name:detect^1.2 | 
> primary_header:detect^1.2 | secondary_header:detect^0.5)~0.17)~2) 
> (inlink_text:"edg detect"~100^1.2 | body:"edg detect"~100^0.5 | title:"edg 
> detect"~100^1.2 | meta_description:"edg detect"~100^0.5 | url_path:"edg 
> detect"~100^1.2 | file_name:"edg detect"~100^1.2 | primary_header:"edg 
> detect"~100^1.2 | secondary_header:"edg 
> detect"~100^0.5)~0.17,product(float(hier_score),pow(float(link_score),const(0.25))))
>  
> I.e., the "pf" part of the query has the following disjunction max form:
> (inlink_text:"edg detect"~100^1.2 | body:"edg detect"~100^0.5 | ... | 
> secondary_header:"edg detect"~100^0.5)~0.17
> pf results for one (3.5): 
> <lst>
> <bool name="match">true</bool>
> <float name="value">1.5689207</float>
> <str name="description">max plus 0.17 times others of:</str>
> <arr name="details">
> <lst>
> <bool name="match">true</bool>
> <float name="value">1.5596248</float>
> <str name="description">...</str>
> <arr name="details">...</arr>
> </lst>
> <lst>
> <bool name="match">true</bool>
> <float name="value">0.054681662</float>
> <str name="description">...</str>
> <arr name="details">...</arr>
> </lst>
> </arr>
> However, in 4.8, "max" and the tie-breaker are nowhere to be seen for the pf 
> part of the query: 
> query (4.8): 
> boost(+(((inlink_text:edg^1.2 | body:edg^0.5 | title:edg^1.2 | 
> meta_description:edg^0.5 | url_path:edg^1.2 | file_name:edg^1.2 | 
> primary_header:edg^1.2 | secondary_header:edg^0.5)~0.17 
> (inlink_text:detect^1.2 | body:detect^0.5 | title:detect^1.2 | 
> meta_description:detect^0.5 | url_path:detect^1.2 | file_name:detect^1.2 | 
> primary_header:detect^1.2 | secondary_header:detect^0.5)~0.17)~2) body:"edg 
> detect"~100^0.5 title:"edg detect"~100^1.2 url_path:"edg detect"~100^1.2 
> file_name:"edg detect"~100^1.2 primary_header:"edg detect"~100^1.2 
> secondary_header:"edg detect"~100^0.5 meta_description:"edg detect"~100^0.5 
> inlink_text:"edg 
> detect"~100^1.2,product(float(hier_score),pow(float(link_score),const(0.25))))
>  
> I.e., the "pf" part of the query does NOT have the disjunction max form:
> body:"edg detect"~100^0.5 title:"edg detect"~100^1.2 ... inlink_text:"edg 
> detect"~100^1.2,
> pf results for one (4.8) (no max, both values are just listed under the "sum 
> of" element: 
> <lst>
> <bool name="match">true</bool>
> <float name="value">0.03554287</float>
> <str name="description">...</str>
> <arr name="details">...</arr>
> </lst>
> <lst>
> <bool name="match">true</bool>
> <float name="value">1.0933692</float>
> <str name="description">...</str>
> <arr name="details">...</arr>
> </lst>
> The Solr 4 handler used is the following - it's also the same as the 3.5 one: 
>  <requestHandler class="solr.SearchHandler" name="/sitewide">
>     
>      <lst name="defaults">
>        <str name="defType">edismax</str>
>        <str name="echoParams">explicit</str>
>         <float name="tie">0.17</float>
>          <str name="qf">
>            body^0.5 title^1.2 url_path^1.2 file_name^1.2 primary_header^1.2 
> secondary_header^0.5 meta_description^0.5 inlink_text^1.2 
>          </str>
>          <str name="pf">
>            body^0.5 title^1.2 url_path^1.2 file_name^1.2 primary_header^1.2 
> secondary_header^0.5 meta_description^0.5 inlink_text^1.2 
>          </str>
>          <int name="ps">100</int>
>      <str name="boost">
>        hier_score 
>      </str>
>      <str name="boost">
>        pow(link_score,0.25) 
>      </str>
>      </lst>
>      <lst name="spellchecker">
>       
>       <str name="spellcheck.onlyMorePopular">false</str>
>       
>       <str name="spellcheck.extendedResults">true</str>
>       
>       <str name="spellcheck.count">3</str>
>       <str name="buildOnCommit">true</str>
>      </lst>
>      <arr name="last-components">
>        <str>spellcheck</str>
>      </arr>
>   </requestHandler>



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to