[ 
https://issues.apache.org/jira/browse/LUCENE-8840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858580#comment-16858580
 ] 

Atri Sharma commented on LUCENE-8840:
-------------------------------------

I am curious to understand how including doc frequencies can be better than the 
overall score. IMO, including BM25 scores gives us some additional advantages, 
such as defending against cases where the overall non matching token count in a 
document is significantly high. Did you see any scenarios that had relevance 
troubles due to inclusion of entire BM25 scores?

 

IMO if we want to restrict the contribution of each term to the blended query's 
final score, then we could think of a blended scorer step which utilizes 
something on the lines of BM25's term frequency saturation when merging scores 
from different blended terms. WDYT?

 

On a different note, I am also wondering if we should devise relevance tests 
which allow us to measure the relevance impact of a change. Something added to 
luceneutil should be nice. Thoughts?

> TopTermsBlendedFreqScoringRewrite should use SynonymQuery
> ---------------------------------------------------------
>
>                 Key: LUCENE-8840
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8840
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Jim Ferenczi
>            Priority: Major
>         Attachments: LUCENE-8840.patch
>
>
> Today the TopTermsBlendedFreqScoringRewrite, which is the default rewrite 
> method for Fuzzy queries, uses the BlendedTermQuery to score documents that 
> match the fuzzy terms. This query blends the frequencies used for scoring 
> across the terms and creates a disjunction of all the blended terms. This 
> means that each fuzzy term that match in a document will add their BM25 score 
> contribution. We already have a query that can blend the statistics of 
> multiple terms in a single scorer that sums the doc frequencies rather than 
> the entire BM25 score: the SynonymQuery. Since 
> https://issues.apache.org/jira/browse/LUCENE-8652 this query also handles 
> boost between 0 and 1 so it should be easy to change the default rewrite 
> method for Fuzzy queries to use it instead of the BlendedTermQuery. This 
> would bound the contribution of each term to the final score which seems a 
> better alternative in terms of relevancy than the current solution. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to