[ 
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744256#comment-16744256
 ] 

Thomas Aglassinger edited comment on SOLR-13126 at 1/16/19 5:15 PM:
--------------------------------------------------------------------

We digged in further and seem to have found the culprit. The test case in the 
attached patch {{0002-SOLR-13126-Added-test-case.patch}} reproduces the bug. 
The last working version is Solr 7.2.1.

Using {{git bisect}} we found out that the issue got introduced with 
LUCENE-8099 (a refactoring). There's two changes that break the scoring in 
different ways:
 * [LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery, 
BoostingQuery|https://github.com/apache/lucene-solr/commit/b01e6023e1cd3c62260b38c05c8d145ba143a2ac]
 * [LUCENE-8099: Replace BoostQParserPlugin.boostQuery() with 
FunctionScoreQuery.boostByValue()|https://github.com/apache/lucene-solr/commit/0744fea821366a853b8e239e766b9786ef96cb27]

The attached patch 
{{0001-use-deprecated-classes-to-fix-regression-introduced-.patch}} includes an 
experimental fix by reverting some parts of the code to its previous version 
based on a deprecated class the refactoring of LUCENE-8099 tried to replace 
(among other things).

This is a rough initial patch with the following known issues:
 # The patch goes towards solr 7.5.0. This is the version we currently 
experience the issues with and attempt to get back to working for production 
use. Ideally of course the patch would go towards the master and then merged 
back to earlier versions.
 # The fix uses a deprecated class. Ideally it would fix the refactored classes 
from LUCENE-8099.
 # The patches are a bit bigger than the should be due to some automatic white 
space reformatting in the IDE.

Nevertheless the test case is generic enough to run on all branches, including 
the current master.


was (Author: roskakori):
We digged in further and seem to have found the culprit. The test case in the 
attached patch {{0002-SOLR-13126-Added-test-case.patch}} reproduces the bug. 
The last working version is Solr 7.2.1.

Using {{git bisect}} we found out that the issue got introduced with 
LUCENE-8099 (a refactoring). There's two changes that break the scoring in 
different ways:
 * [LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery, 
BoostingQuery|https://github.com/apache/lucene-solr/commit/b01e6023e1cd3c62260b38c05c8d145ba143a2ac]
 * [LUCENE-8099: Replace BoostQParserPlugin.boostQuery() with 
FunctionScoreQuery.boostByValue()|https://github.com/apache/lucene-solr/commit/0744fea821366a853b8e239e766b9786ef96cb27]

The attached patch 
{{0001-use-deprecated-classes-to-fix-regression-introduced-.patch}} includes an 
experimental fix by reverting some parts of the code to its previous version 
based on a deprecated class the refactoring of LUCENE-8099 tried to replace 
(among other things).

This is a rough initial patch with the following known issues:
 # The patch goes towards solr 7.5.0. This is the version we currently 
experience the issues with and attempt to get back to working for production 
use. Ideally of course the patch would go towards the master and then merged 
back to earlier versions.
 # The fix uses a deprecated class. Ideally it would fix the refactored classes 
from LUCENE-8099.

Nevertheless the test case is generic enough to run on all branches, including 
the current master.

> Inconsistent score in debug and result with multiple multiplicative boosts
> --------------------------------------------------------------------------
>
>                 Key: SOLR-13126
>                 URL: https://issues.apache.org/jira/browse/SOLR-13126
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>    Affects Versions: 7.5.0
>         Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
> 10 showed the same result.
>            Reporter: Thomas Aglassinger
>            Priority: Major
>         Attachments: 
> 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, 
> 0002-SOLR-13126-Added-test-case.patch, debugQuery.json, 
> solr_match_neither_nextteil_nor_sony.json, 
> solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, 
> solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, 
> solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple 
> multiplicative boosts using the Solr functions {{product()}} and {{query()}} 
> result in a score that is inconsistent with the one from the debugQuery 
> information. Also only the debug score is correct while the actual search 
> results show a wrong score.
> This seems somewhat similar to the behaviour described in 
> https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been 
> resolved a while ago.
> A little background: we are using Solr as a search platform for the 
> e-commerce framework SAP Hybris. There the shop administrator can create 
> multiplicative boost rules (see below for an example) where a value like 2.0 
> means that an item gets boosted to 200%. This works fine in the demo shop 
> distributed by SAP but breaks in our shop. We encountered the issue when 
> Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which 
> would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could 
> reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the 
> parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case 
> in contains certain terms:
>  * "Netzteil" (power supply) is boosted to 200%
>  * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name 
> does not contain the respective term resulting in a pseudo boost that 
> preserves the score.
> According to the debug information the parser used is the LuceneQParser, 
> which translates this to the following parsed query:
> {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
> boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)))))
> {quote}
> And the translated boost is:
> {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))
> {quote}
> When taking a look at the search result, among other the following products 
> are included (see the JSON comments for an analysis of each result):
> {code:javascript}
>      {
>         "id":"someProducts/Online/test7111111",
>         "name_text_de":"Original Sony Vaio Netzteil",
>         "code_string":"test7111111",
>         // CORRECT, both "Netzteil" and "Sony" are included in the name
>         "score":6.0},
>       {
>         "id":"someProducts/Online/taxTestingProductThree",
>         "name_text_de":"Steuertestprodukt Zwei",
>         "code_string":"taxTestingProductThree",
>         // CORRECT, neither "Netzteil" nor "Sony" are included in the name
>         "score":1.0},
>       {
>         "id":"someProducts/Online/797856300000",
>         "name_text_de":"GS-Netzteil 20W schwarz",
>         "code_string":"797856300000",
>         // WRONG, "Netzteil" is part of the name; 
>         // note that we do split words on hyphen because 
>         // WordDelimiterGraphFilterFactory.generateWordParts="1"
>         "score":1.0},
> {code}
> So apparently the multiplicative boost works for product names where all the 
> boosted terms are included but fails if only one of the terms matches.
> There are also other products in the result that contain either "Netzteil" or 
> "Sony" but still get a score of 1.0 instead of 2.0 resp. 3.0.
> Surprisingly in the {{explain}} segment the score for the product with 
> "Netzteil" but without "Sony" correctly is 2.0:
> {code:java}
> 2.0 = product of:
>   1.0 = boost
>   2.0 = product of:
>     1.0 = *:*
>     2.0 = 
> product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=1.0)
> {code}
> The type definition of {{text_de}} in the {{schema.xml}} (which is used for 
> "name_text_de") includes the following filters:
> {code:xml}
> <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
>     <analyzer>
>         <tokenizer class="solr.WhitespaceTokenizerFactory" />
>         <filter class="solr.WordDelimiterGraphFilterFactory"  
> preserveOriginal="1"
>                 generateWordParts="1" generateNumberParts="1" 
> catenateWords="1"
>                 catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
>         <filter class="solr.LowerCaseFilterFactory" />
>     </analyzer>
> </fieldType>
> {code}
> The {{solrconfig.xml}} mostly is taken form the Hybris defaults and AFAIK 
> does not do anything kinky. The following lines might be of interest:
> {code:xml}
> <luceneMatchVersion>7.5.0</luceneMatchVersion>
> <queryParser name="multiMaxScore" 
> class="de.hybris.platform.solr.search.MultiMaxScoreQParserPlugin"/>
> {code}
> To sum it up, my expectation would have been:
> * The score in the result and explain section are identical.
> * Names matching only one of the two multiplied boost terms are receive the 
> respective single boost instead of the default score 1.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to