[ 
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289900#comment-14289900
 ] 

Robert Muir commented on LUCENE-6198:
-------------------------------------

I think as a rule we should start with this API only being used for 
intersection, to avoid the slow operations. 

In the case of "the zoo" I think the current logic will work fine, because it 
returns a conjunction as the approximation, which sorts by cost, and will 
advance zoo (the leader). Its true, it requires advances from "the", but its 
the only way to guarantee you avoid any unnecessary use of positions.

Otherwise, if we just return "zoo" termsenum, it might save a little cpu for 
the approximation intersection, but in many cases can result in more wasted 
usages of positions because of a higher false positive rate. So I think its 
currently too risky, without some restructing/context (e.g. the cost of "the 
other guy" we are intersecting and whether this might be worth the effort).

But yes, i think we could do more in the future, such as using a tiered 
approach (zoo, the zoo, "the zoo") or other possibilities. Mainly for 
prototyping I wanted something minimal to start and I am worried being too 
fancy can hurt simple queries if we are not careful.

> two phase intersection
> ----------------------
>
>                 Key: LUCENE-6198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6198
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-6198.patch
>
>
> Currently some scorers have to do a lot of per-document work to determine if 
> a document is a match. The simplest example is a phrase scorer, but there are 
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches 
> all odd documents, another that is a phrase matching all even documents. 
> Today this conjunction will be very expensive, because the zig-zag 
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like 
> a conjunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to