[jira] [Updated] (SOLR-15449) edimax sow causes issues with minimum should match in case of multi field with different analysis

Mayya Sharipova (Jira) Wed, 23 Jun 2021 06:50:24 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mayya Sharipova updated SOLR-15449:
-----------------------------------
    Security:     (was: Public)

> edimax sow causes issues with minimum should match in case of multi field 
> with different analysis
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-15449
>                 URL: https://issues.apache.org/jira/browse/SOLR-15449
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 8.8.2
>            Reporter: Alessandro Benedetti
>            Priority: Major
>             Fix For: main (9.0)
>
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> h1. Intro
> in multi-field search where the text analysis per field produces a different 
> amount of tokens:
> sow=true causes the minimum should match to be "per document"
> i.e a document to be a match must contain all the mm query terms anywhere at 
> least once
> sow=false causes the minimum should match to be "per field"
>  i.e a document to be a match must contain all the mm query terms in a single 
> field at least once
> When the query parsed moves from being term centric(sow=true) to field 
> centric(sow=false and different text analysis), mm means two different things:
> {code:java}
> sow = true
> mm=2
> qf = author subjects_as_same_term
> q = united kingdom
> defType = edismax
> "parsedquery_toString":
> "+(((author:united | subjects_as_same_term:united) (author:kingdom | 
> subjects_as_same_term:kingdom))~2)"
> {code}
> {code:java}
> "response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[
>       {
>         "id":"888888",
>         "author":"united",
>         "subjects":["kingdom"],
>         "score":7.757958},
>       {
>         "id":"77777",
>         "author":"united kingdom",
>         "score":5.874222}]
>   },
> {code}
> mimimum of query terms matched within the same field (i.e. all query terms 
> required must be found in one of the fields)
> “PER FIELD”
> {code:java}
> sow = false
> mm=2
> qf = author subjects_as_same_term
> q = united kingdom
> defType = edismax
> "parsedquery_toString":
> "+(((author:united author:kingdom)~2) | 
> (((subjects_as_same_term:uk subjects_as_same_term:"united kingdom" 
> subjects_as_same_term:england subjects_as_same_term:london 
> subjects_as_same_term:british subjects_as_same_term:britain))~1))"
> {code}
> This (author:united author:kingdom)~2 means we need both the clauses to match 
> to have a good candidate, in disjunction with
> (subjects_as_same_term:uk subjects_as_same_term:”united kingdom” 
> subjects_as_same_term:england subjects_as_same_term:london 
> subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we 
> need at least one clause to match (because synonyms expanded the two original 
> terms into a single one)
> {code:java}
> "response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[
>       {
>         "id":"77777",
>         "author":"united kingdom",
>         "score":5.874222}]
>   }
> {code}
> h1. Problem
> When a field text analysis is incompatible with the query text, mm is not 
> fully respected:
> {code:java}
> sow = false
> mm=100%
> qf = text numeric_i
> q = terminator 100
> defType = edismax
> "parsedquery_toString":
> "+(((text:terminator text:100)~2) | 
> (numeric_i:100)~1))"
> {code}
> A document just containing '100' in the field numeric_i is returned as a good 
> search result but it actually doesn't respect the mm=100%
> Reference: 
> https://sease.io/2021/05/apache-solr-sow-parameter-split-on-whitespace-and-multi-field-full-text-search.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-15449) edimax sow causes issues with minimum should match in case of multi field with different analysis

Reply via email to