[
https://issues.apache.org/jira/browse/SOLR-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brian resolved SOLR-6243.
-------------------------
Resolution: Fixed
Started fixing this when I saw the latest 4.X source was already changed to
resolve the issue,
> eDisMax hidden change - no longer applies disjunction max to "pf" query
> -----------------------------------------------------------------------
>
> Key: SOLR-6243
> URL: https://issues.apache.org/jira/browse/SOLR-6243
> Project: Solr
> Issue Type: Bug
> Components: query parsers
> Affects Versions: 4.8.1
> Reporter: Brian
> Labels: edismax, extendedDisMax, pf, phrase
>
> At some point after Solr 3.5 a bug was introduced into eDisMax (Extended
> DisMax Query parser) that is still there as of Solr 4.8.1. The "pf" part of
> the query (full phrase query) no longer is applied as a disjunction max query
> - instead all the matching field scores are simply added to the total score.
> I.e. they are just added together as opposed to the max being taken +
> tie-breaker times the sum of the other match scores.
> This changes the scores and the rankings significantly. When upgrading from
> Solr 3.5, one of our relevance test measures showed target results dropping
> over a full rank due to this bug. On key result went from being at rank 7 to
> past rank 40. I do not see any easy workaround for this.
> The following is a comparison between query results for Solr 3.5 and Solr
> 4.8, for one query, showing the "pf" parts of the query and scores.
> Turning debug query on, the results are the following, They clearly show
> that that max is used with the tiebreaker in 3.5 but not 4.8 for pf:
> query (3.5):
> boost(+(((inlink_text:edg^1.2 | body:edg^0.5 | title:edg^1.2 |
> meta_description:edg^0.5 | url_path:edg^1.2 | file_name:edg^1.2 |
> primary_header:edg^1.2 | secondary_header:edg^0.5)~0.17
> (inlink_text:detect^1.2 | body:detect^0.5 | title:detect^1.2 |
> meta_description:detect^0.5 | url_path:detect^1.2 | file_name:detect^1.2 |
> primary_header:detect^1.2 | secondary_header:detect^0.5)~0.17)~2)
> (inlink_text:"edg detect"~100^1.2 | body:"edg detect"~100^0.5 | title:"edg
> detect"~100^1.2 | meta_description:"edg detect"~100^0.5 | url_path:"edg
> detect"~100^1.2 | file_name:"edg detect"~100^1.2 | primary_header:"edg
> detect"~100^1.2 | secondary_header:"edg
> detect"~100^0.5)~0.17,product(float(hier_score),pow(float(link_score),const(0.25))))
>
> I.e., the "pf" part of the query has the following disjunction max form:
> (inlink_text:"edg detect"~100^1.2 | body:"edg detect"~100^0.5 | ... |
> secondary_header:"edg detect"~100^0.5)~0.17
> pf results for one (3.5):
> <lst>
> <bool name="match">true</bool>
> <float name="value">1.5689207</float>
> <str name="description">max plus 0.17 times others of:</str>
> <arr name="details">
> <lst>
> <bool name="match">true</bool>
> <float name="value">1.5596248</float>
> <str name="description">...</str>
> <arr name="details">...</arr>
> </lst>
> <lst>
> <bool name="match">true</bool>
> <float name="value">0.054681662</float>
> <str name="description">...</str>
> <arr name="details">...</arr>
> </lst>
> </arr>
> However, in 4.8, "max" and the tie-breaker are nowhere to be seen for the pf
> part of the query:
> query (4.8):
> boost(+(((inlink_text:edg^1.2 | body:edg^0.5 | title:edg^1.2 |
> meta_description:edg^0.5 | url_path:edg^1.2 | file_name:edg^1.2 |
> primary_header:edg^1.2 | secondary_header:edg^0.5)~0.17
> (inlink_text:detect^1.2 | body:detect^0.5 | title:detect^1.2 |
> meta_description:detect^0.5 | url_path:detect^1.2 | file_name:detect^1.2 |
> primary_header:detect^1.2 | secondary_header:detect^0.5)~0.17)~2) body:"edg
> detect"~100^0.5 title:"edg detect"~100^1.2 url_path:"edg detect"~100^1.2
> file_name:"edg detect"~100^1.2 primary_header:"edg detect"~100^1.2
> secondary_header:"edg detect"~100^0.5 meta_description:"edg detect"~100^0.5
> inlink_text:"edg
> detect"~100^1.2,product(float(hier_score),pow(float(link_score),const(0.25))))
>
> I.e., the "pf" part of the query does NOT have the disjunction max form:
> body:"edg detect"~100^0.5 title:"edg detect"~100^1.2 ... inlink_text:"edg
> detect"~100^1.2,
> pf results for one (4.8) (no max, both values are just listed under the "sum
> of" element:
> <lst>
> <bool name="match">true</bool>
> <float name="value">0.03554287</float>
> <str name="description">...</str>
> <arr name="details">...</arr>
> </lst>
> <lst>
> <bool name="match">true</bool>
> <float name="value">1.0933692</float>
> <str name="description">...</str>
> <arr name="details">...</arr>
> </lst>
> The Solr 4 handler used is the following - it's also the same as the 3.5 one:
> <requestHandler class="solr.SearchHandler" name="/sitewide">
>
> <lst name="defaults">
> <str name="defType">edismax</str>
> <str name="echoParams">explicit</str>
> <float name="tie">0.17</float>
> <str name="qf">
> body^0.5 title^1.2 url_path^1.2 file_name^1.2 primary_header^1.2
> secondary_header^0.5 meta_description^0.5 inlink_text^1.2
> </str>
> <str name="pf">
> body^0.5 title^1.2 url_path^1.2 file_name^1.2 primary_header^1.2
> secondary_header^0.5 meta_description^0.5 inlink_text^1.2
> </str>
> <int name="ps">100</int>
> <str name="boost">
> hier_score
> </str>
> <str name="boost">
> pow(link_score,0.25)
> </str>
> </lst>
> <lst name="spellchecker">
>
> <str name="spellcheck.onlyMorePopular">false</str>
>
> <str name="spellcheck.extendedResults">true</str>
>
> <str name="spellcheck.count">3</str>
> <str name="buildOnCommit">true</str>
> </lst>
> <arr name="last-components">
> <str>spellcheck</str>
> </arr>
> </requestHandler>
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]