[
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290078#comment-14290078
]
Robert Muir commented on LUCENE-6198:
-------------------------------------
Maybe, but currently the implementation (i added javadocs for some of my
approaches, not this one) might have certain needs. ExactPhraseScorer's
matches() needs everything is already positioned on the document. It just has
to hasFreq() > 0. ConjunctionScorer also has certain invariants about its subs.
With this patch, the approximation is just a "view" of the actual one, so
next()/advance()'ing it must also position the parent (e.g. to then call
score()). Restricting the API to this "view" could be seen as unfriendly, but i
found it was the only way to keep good performance: e.g. we only ever really
have one docsandpositionsenum, otherwise we read redundant skipdata and read
unnecessary positions.
So I would rather the implementation (PhraseScorer) do any fanciness itself.
The current approach is "bottoms up" for nested query trees anyway. matches()
is "bubbled down" and this means its still better to have Conj(A, B, C) than
Conj(A, (B, C)) but thats not an issue I am trying to solve here, and it never
causes any unnecessary reads of positions, just the same unnecessary
advance()'ing being done already.
> two phase intersection
> ----------------------
>
> Key: LUCENE-6198
> URL: https://issues.apache.org/jira/browse/LUCENE-6198
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Attachments: LUCENE-6198.patch
>
>
> Currently some scorers have to do a lot of per-document work to determine if
> a document is a match. The simplest example is a phrase scorer, but there are
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches
> all odd documents, another that is a phrase matching all even documents.
> Today this conjunction will be very expensive, because the zig-zag
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like
> a conjunction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]