[jira] [Commented] (LUCENE-6198) two phase intersection

Robert Muir (JIRA) Fri, 23 Jan 2015 15:41:48 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290220#comment-14290220
 ]


Robert Muir commented on LUCENE-6198:
-------------------------------------

{quote}
If we were doing pure greenfield API design, w/o any concern for backcompat, 
and we wanted to support this type of multi-phase intersection support in the 
Doc iterators, what would we want it to look like?
{quote}

Then it would not be on DocIdSetIterator. I put it there, to be reasonable in a 
backwards compatible way, to try to make filters fast too. But it does not 
belong there: so the changes are intended to be minimal and have the least 
interference with things like postings lists.

I'll be honest: I don't want to wait until 6.0 to fix the performance of 
proximity queries. And to do things properly IMO, Filter needs to be removed 
and merged with Query for that to happen. I don't think i can do that easily in 
5.x.

{quote}
...and eliminate the need for callers/wrappers to know/care if/when the 
implementation supports "aproximations" – instead all callers just check 
candidateIsMatch() and we trust the JVM to optimize that call away when it's a 
constant.
{quote}

But there are several problems here:
1. the need is actually important. Thats why the two-phase scorer has a 
separate array holding the ones it must verify: only the approximate ones! What 
particular compiler optimization did you think should help in that case so that 
conjunctions don't have performance issues? I develop with the latest 8u40-ea.
2. it makes things much more difficult for a _single java object_ to support 
both approximate and exact iteration. Its by far easier to just return 
something around some underlying structure that already exists (e.g. underlying 
conjunction of a phrase) and not have to support switching back and forth and 
so on.
3. I don't think lucene should have a scoring or iteration apis that are 
"approximate by definition". That is *by far* too hard to use for the 90% case. 
 Instead we should have the ability to explicitly "ask" for an approximation in 
special cases. Those are: doing zig-zag intersection, document filtering, etc.

> two phase intersection
> ----------------------
>
>                 Key: LUCENE-6198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6198
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-6198.patch
>
>
> Currently some scorers have to do a lot of per-document work to determine if 
> a document is a match. The simplest example is a phrase scorer, but there are 
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches 
> all odd documents, another that is a phrase matching all even documents. 
> Today this conjunction will be very expensive, because the zig-zag 
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like 
> a conjunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6198) two phase intersection

Reply via email to