[
https://issues.apache.org/jira/browse/LUCENE-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962541#comment-14962541
]
Paul Elschot commented on LUCENE-6276:
--------------------------------------
I left the matchCosts that I could not easily determine at zero and added a
CHECKME. This is more an indication that refinement is possible.
Sorting subscorers/subspans by cost and matchCost is probably better than
relying on any given order.
Anyway I don't expect the impact of matchCost on performance be more than 4-8%
except maybe for really complex queries.
Showing the matchCost in explain will be tricky because it is computed by
LeafReaderContext, i.e. by segment.
The matchCost is not yet used for the second phase in disjunctions. Yet another
priority queue might be needed for that, so I'd prefer to delay that to another
issue.
> Add matchCost() api to TwoPhaseDocIdSetIterator
> -----------------------------------------------
>
> Key: LUCENE-6276
> URL: https://issues.apache.org/jira/browse/LUCENE-6276
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Attachments: LUCENE-6276-ExactPhraseOnly.patch,
> LUCENE-6276-NoSpans.patch, LUCENE-6276-NoSpans2.patch, LUCENE-6276.patch,
> LUCENE-6276.patch, LUCENE-6276.patch, LUCENE-6276.patch
>
>
> We could add a method like TwoPhaseDISI.matchCost() defined as something like
> estimate of nanoseconds or similar.
> ConjunctionScorer could use this method to sort its 'twoPhaseIterators' array
> so that cheaper ones are called first. Today it has no idea if one scorer is
> a simple phrase scorer on a short field vs another that might do some geo
> calculation or more expensive stuff.
> PhraseScorers could implement this based on index statistics (e.g.
> totalTermFreq/maxDoc)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]