Thomas Aglassinger created SOLR-13126:
-----------------------------------------

             Summary: Inconsistent score in debug and result with multiple 
multiplicative boosts
                 Key: SOLR-13126
                 URL: https://issues.apache.org/jira/browse/SOLR-13126
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: search
    Affects Versions: 7.5.0
         Environment: Reproduced with macOS 10.14.1, a quick test with Windows 
10 showed the same result.
            Reporter: Thomas Aglassinger
         Attachments: debugQuery.json

Under certain circumstances search results from queries with multiple 
multiplicative boosts using the Solr functions {{product()}} and {{query()}} 
result in a score that is inconsistent with the one from the debugQuery 
information. Also only the debug score is correct while the actual search 
results show a wrong score.

This seems somewhat similar to the behaviour described in 
https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been 
resolved a while ago.

A little background: we are using Solr as a search platform for the e-commerce 
framework SAP Hybris. There the shop administrator can create multiplicative 
boost rules (see below for an example) where a value like 2.0 means that an 
item gets boosted to 200%. This works fine in the demo shop distributed by SAP 
but breaks in our shop. We encountered the issue when Upgrading from Solr 7.2.1 
/ Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which would have been named Hybris 
6.8 but the version naming schema changed).

We reduced the Solr query generated by Hybris to the relevant parts and could 
reproduce the issue in the Solr admin without any Hybris connection. 

I attached the JSON result of a test query but here's a description of the 
parts that seemed most relevant to me.

The {{responseHeader.params}} reads (slightly rearranged):

{code}
"q":"{!boost b=$ymb}(+{!lucene v=$yq})",
"ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
"yq":"*:*",
"sort":"score desc",
"debugQuery":"true",
// Added to keep the output small but probably unrelated to the actual issue
"fl":"score,id,code_string,name_text_de",
"fq":"catalogId:\"someProducts\"",
"rows":"10",
{code}

This example boosts the German product name (field {{name_text_de}}) in case in 
contains certain terms:

* "Netzteil" (power supply) is boosted to 200%
* "Sony" is boosted to 300%

Consequently a product containing both terms should be boosted to 600%.

Also the query function has the value 1 specified as default in case the name 
does not contain the respective term resulting in a pseudo boost that preserves 
the score.

According to the debug information the parser used is the LuceneQParser, which 
translates this to the following parsed query:

FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by 
boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)))))

And the translated boost is:

org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))

When taking a look at the search result, among other the following products are 
included (see the JSON comments for an analysis of each result):

{code:javascript}
     {
        "id":"someProducts/Online/test7111111",
        "name_text_de":"Original Sony Vaio Netzteil",
        "code_string":"test7111111",
        // CORRECT, both "Netzteil" and "Sony" are included in the name
        "score":6.0},
      {
        "id":"someProducts/Online/taxTestingProductThree",
        "name_text_de":"Steuertestprodukt Zwei",
        "code_string":"taxTestingProductThree",
        // CORRECT, neither "Netzteil" nor "Sony" are included in the name
        "score":1.0},
      {
        "id":"someProducts/Online/797856300000",
        "name_text_de":"GS-Netzteil 20W schwarz",
        "code_string":"797856300000",
        // WRONG, "Netzteil" is part of the name; 
        // note that we do split words on hyphen because 
        // WordDelimiterGraphFilterFactory.generateWordParts="1"
        "score":1.0},
{code}

So apparently the multiplicative boost works for product names where all the 
boosted terms are included but fails if only one of the terms matches.

There are also other products in the result that contain either "Netzteil" or 
"Sony" but still get a score of 1.0 instead of 2.0 resp. 3.0.

Surprisingly in the {{explain}} segment the score for the product with 
"Netzteil" but without "Sony" correctly is 2.0:

{code}
2.0 = product of:
  1.0 = boost
  2.0 = product of:
    1.0 = *:*
    2.0 = 
product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=1.0)
{code}

The type definition of {{text_de}} in the {{schema.xml}} (which is used for 
"name_text_de") includes the following filters:

{code:xml}
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.WordDelimiterGraphFilterFactory"  
preserveOriginal="1"
                generateWordParts="1" generateNumberParts="1" catenateWords="1"
                catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
</fieldType>
{code}

The {{solrconfig.xml}} mostly is taken form the Hybris defaults and AFAIK does 
not do anything kinky. The following lines might be of interest:

{code:xml}
<luceneMatchVersion>7.5.0</luceneMatchVersion>
<queryParser name="multiMaxScore" 
class="de.hybris.platform.solr.search.MultiMaxScoreQParserPlugin"/>
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to