[ 
https://issues.apache.org/jira/browse/LUCENE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483899#comment-16483899
 ] 

Adrien Grand commented on LUCENE-8311:
--------------------------------------

Unfortunately I don't think this is due to this scoring issue, but rather to 
the fact that a single position of a given term is allowed to be part of 
several matches in sloppy phrases. For instance if the query is {{"the 
fox"~4}}, and {{the}} and {{fox}} have respective term frequencies of 5 and 1. 
Then we can assume that the maximum frequency is 1 for an exact phrase (the min 
of both freqs). But if the query is a sloppy phrase query, we could have a 
frequency of 4 if a document has 5 occurrences of {{the}} at position N (as 
synonyms of each other) and 1 occurrence of {{fox}} at position {{N+1}}. Yet 
such documents that trigger the maximum frequency do not exist in practice, 
which causes the score upper bounds that we compute to be significantly higher 
than the scores that are computed in practice, so no blocks of documents are 
ever skipped because their score is not competitive.

> Leverage impacts for phrase queries
> -----------------------------------
>
>                 Key: LUCENE-8311
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8311
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8311.patch
>
>
> Now that we expose raw impacts, we could leverage them for phrase queries.
> For instance for exact phrases, we could take the minimum term frequency for 
> each unique norm value in order to get upper bounds of the score for the 
> phrase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to