[jira] [Commented] (LUCENE-6198) two phase intersection

Robert Muir (JIRA) Fri, 23 Jan 2015 10:36:04 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289684#comment-14289684
 ]


Robert Muir commented on LUCENE-6198:
-------------------------------------

{quote}
The main thing that confuses me about what I see is the separation between 
TwoPhase & TwoPhaseApproximation despite the comments. Couldn't TwoPhase.verify 
return true, and getApproximation return ‘this’?
{quote}

It is confusing. TwoPhase does two-phase intersection, it works on 
approximations, but it is an "exact" scorer, e.g. its what is used if you 
AND(term, phrase). 

However, its possible you could have nested conjunctions such as AND(term1, 
AND(term2, phrase)). So ConjunctionScorer itself, supports approximations when 
any of its subs do. TwoPhaseApproximation is this impl, which defers matches() 
to the caller.

This way confirmation is deferred until there is "global docid" agreement 
across the whole query tree. With this patch its only going to work with nested 
conjunctions, because thats all i implemented it for. Obviously for it to work 
across the board (means put geo/phrase queries anywhere in query/filter tree at 
arbitrary places and everything "works"), disjunctions and other boolean-like 
scorers must implement the API too when their subs support approximations.

{quote}
BTW, please don't generalize all geo as being slow; there are multiple 
strategies with performance trade-offs for implementing geo.
{quote}

It was not a stab at geo or anything. phrases are in the same category. It is 
just another use case where verifying the document is actually a match, is more 
costly then moving to the next "possible" document for the purpose of zig-zag 
intersection. Exactphrasescorer is a tricky case since its not TOO terribly 
expensive to verify a match, but still should be a win. thats why i tried to 
prototyped with it first.


> two phase intersection
> ----------------------
>
>                 Key: LUCENE-6198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6198
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-6198.patch
>
>
> Currently some scorers have to do a lot of per-document work to determine if 
> a document is a match. The simplest example is a phrase scorer, but there are 
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches 
> all odd documents, another that is a phrase matching all even documents. 
> Today this conjunction will be very expensive, because the zig-zag 
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like 
> a conjunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6198) two phase intersection

Reply via email to