[
https://issues.apache.org/jira/browse/SOLR-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mayya Sharipova updated SOLR-15449:
-----------------------------------
Security: (was: Public)
> edimax sow causes issues with minimum should match in case of multi field
> with different analysis
> -------------------------------------------------------------------------------------------------
>
> Key: SOLR-15449
> URL: https://issues.apache.org/jira/browse/SOLR-15449
> Project: Solr
> Issue Type: Bug
> Affects Versions: 8.8.2
> Reporter: Alessandro Benedetti
> Priority: Major
> Fix For: main (9.0)
>
> Time Spent: 4h
> Remaining Estimate: 0h
>
> h1. Intro
> in multi-field search where the text analysis per field produces a different
> amount of tokens:
> sow=true causes the minimum should match to be "per document"
> i.e a document to be a match must contain all the mm query terms anywhere at
> least once
> sow=false causes the minimum should match to be "per field"
> i.e a document to be a match must contain all the mm query terms in a single
> field at least once
> When the query parsed moves from being term centric(sow=true) to field
> centric(sow=false and different text analysis), mm means two different things:
> {code:java}
> sow = true
> mm=2
> qf = author subjects_as_same_term
> q = united kingdom
> defType = edismax
> "parsedquery_toString":
> "+(((author:united | subjects_as_same_term:united) (author:kingdom |
> subjects_as_same_term:kingdom))~2)"
> {code}
> {code:java}
> "response":{"numFound":2,"start":0,"maxScore":7.757958,"numFoundExact":true,"docs":[
> {
> "id":"888888",
> "author":"united",
> "subjects":["kingdom"],
> "score":7.757958},
> {
> "id":"77777",
> "author":"united kingdom",
> "score":5.874222}]
> },
> {code}
> mimimum of query terms matched within the same field (i.e. all query terms
> required must be found in one of the fields)
> “PER FIELD”
> {code:java}
> sow = false
> mm=2
> qf = author subjects_as_same_term
> q = united kingdom
> defType = edismax
> "parsedquery_toString":
> "+(((author:united author:kingdom)~2) |
> (((subjects_as_same_term:uk subjects_as_same_term:"united kingdom"
> subjects_as_same_term:england subjects_as_same_term:london
> subjects_as_same_term:british subjects_as_same_term:britain))~1))"
> {code}
> This (author:united author:kingdom)~2 means we need both the clauses to match
> to have a good candidate, in disjunction with
> (subjects_as_same_term:uk subjects_as_same_term:”united kingdom”
> subjects_as_same_term:england subjects_as_same_term:london
> subjects_as_same_term:british subjects_as_same_term:britain))~1 that means we
> need at least one clause to match (because synonyms expanded the two original
> terms into a single one)
> {code:java}
> "response":{"numFound":1,"start":0,"maxScore":5.874222,"numFoundExact":true,"docs":[
> {
> "id":"77777",
> "author":"united kingdom",
> "score":5.874222}]
> }
> {code}
> h1. Problem
> When a field text analysis is incompatible with the query text, mm is not
> fully respected:
> {code:java}
> sow = false
> mm=100%
> qf = text numeric_i
> q = terminator 100
> defType = edismax
> "parsedquery_toString":
> "+(((text:terminator text:100)~2) |
> (numeric_i:100)~1))"
> {code}
> A document just containing '100' in the field numeric_i is returned as a good
> search result but it actually doesn't respect the mm=100%
> Reference:
> https://sease.io/2021/05/apache-solr-sow-parameter-split-on-whitespace-and-multi-field-full-text-search.html
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]