[jira] [Commented] (LUCENE-6198) two phase intersection

Hoss Man (JIRA) Fri, 23 Jan 2015 14:13:51 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290088#comment-14290088
 ]


Hoss Man commented on LUCENE-6198:
----------------------------------


Spitballing...

{quote}
bq. I had thought about this a while ago via adding something like ...

Yeah I explored many similar things because I hate the API i have.

...

Other options i tried got tricky, the problem is, I think we really want it to 
work for Filters too, so things must be at this very low DocIdSetIterator level 
(versus Scorer, or even DocsEnum where it maybe could be done more 
intuitively). When looking at changes to DocIdSetIterator, i definitely wanted 
it to be an optional thing because its so widespread, to minimize impact to the 
codebase.
{quote}

If we were doing pure greenfield API design, w/o any concern for backcompat, 
and we wanted to support this type of multi-phase intersection support in the 
Doc iterators, what would we want it to look like?

Would it make sense if _instead of_ {{docID()}}, {{nextDoc()}}, and 
{{advance(int)}} we DISI looked something like...

{code}
public abstract class DocIdSetIterator {
  public abstract long cost();
  public abstract int nextCandidateDoc() throws IOException; // nextDoc
  public abstract int advanceNextCandidateBeyond(int target) throws 
IOException; // advance
  public boolean candidateIsMatch() throws IOException {
    return true;
  }
  /** meaningless unless {@link #candidateIsMatch} is true */
  public abstract int candidateDocID(); // docID
}
{code}

...and eliminate the need for callers/wrappers to know/care if/when the 
implementation supports "aproximations" -- instead all callers just check 
{{candidateIsMatch()}} and we trust the JVM to optimize that call away when 
it's a constant.


Would that be an API people prefered over the current patch?

> two phase intersection
> ----------------------
>
>                 Key: LUCENE-6198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6198
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-6198.patch
>
>
> Currently some scorers have to do a lot of per-document work to determine if 
> a document is a match. The simplest example is a phrase scorer, but there are 
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches 
> all odd documents, another that is a phrase matching all even documents. 
> Today this conjunction will be very expensive, because the zig-zag 
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like 
> a conjunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6198) two phase intersection

Reply via email to