Brian created SOLR-6243:
---------------------------
Summary: eDisMax hidden change - no longer applies disjunction max
to "pf" query
Key: SOLR-6243
URL: https://issues.apache.org/jira/browse/SOLR-6243
Project: Solr
Issue Type: Bug
Components: query parsers
Affects Versions: 4.8.1
Reporter: Brian
At some point after Solr 3.5 a bug was introduced into eDisMax (Extended DisMax
Query parser) that is still there as of Solr 4.8.1. The "pf" part of the query
(full phrase query) no longer is applied as a disjunction max query - instead
all the matching field scores are simply added to the total score. I.e. they
are just added together as opposed to the max being taken + tie-breaker times
the sum of the other match scores.
This changes the scores and the rankings significantly. When upgrading from
Solr 3.5, one of our relevance test measures showed target results dropping
over a full rank due to this bug. On key result went from being at rank 7 to
past rank 40. I do not see any easy workaround for this.
The following is a comparison between query results for Solr 3.5 and Solr 4.8,
for one query, showing the "pf" parts of the query and scores.
Turning debug query on, the results are the following, They clearly show that
that max is used with the tiebreaker in 3.5 but not 4.8 for pf:
query (3.5):
boost(+(((inlink_text:edg^1.2 | body:edg^0.5 | title:edg^1.2 |
meta_description:edg^0.5 | url_path:edg^1.2 | file_name:edg^1.2 |
primary_header:edg^1.2 | secondary_header:edg^0.5)~0.17 (inlink_text:detect^1.2
| body:detect^0.5 | title:detect^1.2 | meta_description:detect^0.5 |
url_path:detect^1.2 | file_name:detect^1.2 | primary_header:detect^1.2 |
secondary_header:detect^0.5)~0.17)~2) (inlink_text:"edg detect"~100^1.2 |
body:"edg detect"~100^0.5 | title:"edg detect"~100^1.2 | meta_description:"edg
detect"~100^0.5 | url_path:"edg detect"~100^1.2 | file_name:"edg
detect"~100^1.2 | primary_header:"edg detect"~100^1.2 | secondary_header:"edg
detect"~100^0.5)~0.17,product(float(hier_score),pow(float(link_score),const(0.25))))
I.e., the "pf" part of the query has the following disjunction max form:
(inlink_text:"edg detect"~100^1.2 | body:"edg detect"~100^0.5 | ... |
secondary_header:"edg detect"~100^0.5)~0.17
pf results for one (3.5):
<lst>
<bool name="match">true</bool>
<float name="value">1.5689207</float>
<str name="description">max plus 0.17 times others of:</str>
<arr name="details">
<lst>
<bool name="match">true</bool>
<float name="value">1.5596248</float>
<str name="description">...</str>
<arr name="details">...</arr>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">0.054681662</float>
<str name="description">...</str>
<arr name="details">...</arr>
</lst>
</arr>
However, in 4.8, "max" and the tie-breaker are nowhere to be seen for the pf
part of the query:
query (4.8):
boost(+(((inlink_text:edg^1.2 | body:edg^0.5 | title:edg^1.2 |
meta_description:edg^0.5 | url_path:edg^1.2 | file_name:edg^1.2 |
primary_header:edg^1.2 | secondary_header:edg^0.5)~0.17 (inlink_text:detect^1.2
| body:detect^0.5 | title:detect^1.2 | meta_description:detect^0.5 |
url_path:detect^1.2 | file_name:detect^1.2 | primary_header:detect^1.2 |
secondary_header:detect^0.5)~0.17)~2) body:"edg detect"~100^0.5 title:"edg
detect"~100^1.2 url_path:"edg detect"~100^1.2 file_name:"edg detect"~100^1.2
primary_header:"edg detect"~100^1.2 secondary_header:"edg detect"~100^0.5
meta_description:"edg detect"~100^0.5 inlink_text:"edg
detect"~100^1.2,product(float(hier_score),pow(float(link_score),const(0.25))))
I.e., the "pf" part of the query does NOT have the disjunction max form:
body:"edg detect"~100^0.5 title:"edg detect"~100^1.2 ... inlink_text:"edg
detect"~100^1.2,
pf results for one (4.8) (no max, both values are just listed under the "sum
of" element:
<lst>
<bool name="match">true</bool>
<float name="value">0.03554287</float>
<str name="description">...</str>
<arr name="details">...</arr>
</lst>
<lst>
<bool name="match">true</bool>
<float name="value">1.0933692</float>
<str name="description">...</str>
<arr name="details">...</arr>
</lst>
The Solr 4 handler used is the following - it's also the same as the 3.5 one:
<requestHandler class="solr.SearchHandler" name="/sitewide">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
<float name="tie">0.17</float>
<str name="qf">
body^0.5 title^1.2 url_path^1.2 file_name^1.2 primary_header^1.2
secondary_header^0.5 meta_description^0.5 inlink_text^1.2
</str>
<str name="pf">
body^0.5 title^1.2 url_path^1.2 file_name^1.2 primary_header^1.2
secondary_header^0.5 meta_description^0.5 inlink_text^1.2
</str>
<int name="ps">100</int>
<str name="boost">
hier_score
</str>
<str name="boost">
pow(link_score,0.25)
</str>
</lst>
<lst name="spellchecker">
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">3</str>
<str name="buildOnCommit">true</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]