[jira] [Commented] (LUCENE-7993) Speed up phrase queries when total hit count is not needed

Robert Muir (JIRA) Sat, 14 Oct 2017 06:55:47 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204641#comment-16204641
 ]


Robert Muir commented on LUCENE-7993:
-------------------------------------

ok, i get it, the optimization is ok since we actually call score for the doc 
with the theoretically maximum possible TF (taking its norm into account), just 
before reading positions.

Note that this optimization is definitely unsafe for some broken similarities 
(specifically the ones documented to be broken in this way, such as DFR model 
P), and probably also for certain parameters to e.g. DFR NormalizationXXX. 
Additionally some similarities (such as AxiomaticXYZ) are not in our random 
test framework, so its unknown there. We could use some explicit tests rather 
than relying on the test suite in that way, too.

But the requirement is 100% reasonable, we can't let some fundamentally broken 
formulas get in our way here :) I would go a step further and say that maybe 
these broken ones should move to the sandbox?

> Speed up phrase queries when total hit count is not needed
> ----------------------------------------------------------
>
>                 Key: LUCENE-7993
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7993
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7993.patch
>
>
> Follow-up of LUCENE-4100: When thinking about the API that we needed to 
> introduce to support MAXSCORE, I wondered whether the same API could support 
> other optimizations. The idea is that when running phrase queries, before we 
> start reading positions, we already have access to the term frequency of each 
> term. And the frequency of the phrase is bounded by the minimum term 
> frequency of the involved terms. So if the score for that minimum term 
> frequency is not competitive then it means that the score for the phrase is 
> not competitive either if we can assume that the score increases (or 
> stagnates) when the term freq increases, which sounds like an ok requirement 
> for a sane Similarity?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7993) Speed up phrase queries when total hit count is not needed

Reply via email to