[
https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767853#comment-16767853
]
Tobias Ibounig commented on SOLR-13126:
---------------------------------------
[~mkhludnev]
Testcase was derived from SampleTest where these configs are used, we just
didn't change them.
https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/SampleTest.java
> Inconsistent score in debug and result with multiple multiplicative boosts
> --------------------------------------------------------------------------
>
> Key: SOLR-13126
> URL: https://issues.apache.org/jira/browse/SOLR-13126
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: search
> Affects Versions: 7.5.0
> Environment: Reproduced with macOS 10.14.1, a quick test with Windows
> 10 showed the same result.
> Reporter: Thomas Aglassinger
> Priority: Major
> Attachments:
> 0001-use-deprecated-classes-to-fix-regression-introduced-.patch,
> 0002-SOLR-13126-Added-test-case.patch, SOLR-13126.patch, debugQuery.json,
> image-2019-02-13-16-17-56-272.png, screenshot-1.png,
> solr_match_neither_nextteil_nor_sony.json,
> solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json,
> solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json,
> solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple
> multiplicative boosts using the Solr functions {{product()}} and {{query()}}
> result in a score that is inconsistent with the one from the debugQuery
> information. Also only the debug score is correct while the actual search
> results show a wrong score.
> This seems somewhat similar to the behaviour described in
> https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been
> resolved a while ago.
> A little background: we are using Solr as a search platform for the
> e-commerce framework SAP Hybris. There the shop administrator can create
> multiplicative boost rules (see below for an example) where a value like 2.0
> means that an item gets boosted to 200%. This works fine in the demo shop
> distributed by SAP but breaks in our shop. We encountered the issue when
> Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which
> would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could
> reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the
> parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case
> in contains certain terms:
> * "Netzteil" (power supply) is boosted to 200%
> * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name
> does not contain the respective term resulting in a pseudo boost that
> preserves the score.
> According to the debug information the parser used is the LuceneQParser,
> which translates this to the following parsed query:
> {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by
> boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)))))
> {quote}
> And the translated boost is:
> {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))
> {quote}
> When taking a look at the search result, among other the following products
> are included (see the JSON comments for an analysis of each result):
> {code:javascript}
> {
> "id":"someProducts/Online/test7111111",
> "name_text_de":"Original Sony Vaio Netzteil",
> "code_string":"test7111111",
> // CORRECT, both "Netzteil" and "Sony" are included in the name
> "score":6.0},
> {
> "id":"someProducts/Online/taxTestingProductThree",
> "name_text_de":"Steuertestprodukt Zwei",
> "code_string":"taxTestingProductThree",
> // CORRECT, neither "Netzteil" nor "Sony" are included in the name
> "score":1.0},
> {
> "id":"someProducts/Online/797856300000",
> "name_text_de":"GS-Netzteil 20W schwarz",
> "code_string":"797856300000",
> // WRONG, "Netzteil" is part of the name;
> // note that we do split words on hyphen because
> // WordDelimiterGraphFilterFactory.generateWordParts="1"
> "score":1.0},
> {code}
> So apparently the multiplicative boost works for product names where all the
> boosted terms are included but fails if only one of the terms matches.
> There are also other products in the result that contain either "Netzteil" or
> "Sony" but still get a score of 1.0 instead of 2.0 resp. 3.0.
> Surprisingly in the {{explain}} segment the score for the product with
> "Netzteil" but without "Sony" correctly is 2.0:
> {code:java}
> 2.0 = product of:
> 1.0 = boost
> 2.0 = product of:
> 1.0 = *:*
> 2.0 =
> product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=1.0)
> {code}
> The type definition of {{text_de}} in the {{schema.xml}} (which is used for
> "name_text_de") includes the following filters:
> {code:xml}
> <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
> <analyzer>
> <tokenizer class="solr.WhitespaceTokenizerFactory" />
> <filter class="solr.WordDelimiterGraphFilterFactory"
> preserveOriginal="1"
> generateWordParts="1" generateNumberParts="1"
> catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
> <filter class="solr.LowerCaseFilterFactory" />
> </analyzer>
> </fieldType>
> {code}
> The {{solrconfig.xml}} mostly is taken form the Hybris defaults and AFAIK
> does not do anything kinky. The following lines might be of interest:
> {code:xml}
> <luceneMatchVersion>7.5.0</luceneMatchVersion>
> <queryParser name="multiMaxScore"
> class="de.hybris.platform.solr.search.MultiMaxScoreQParserPlugin"/>
> {code}
> To sum it up, my expectation would have been:
> * The score in the result and explain section are identical.
> * Names matching only one of the two multiplied boost terms are receive the
> respective single boost instead of the default score 1.0.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]