[
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14290220#comment-14290220
]
Robert Muir commented on LUCENE-6198:
-------------------------------------
{quote}
If we were doing pure greenfield API design, w/o any concern for backcompat,
and we wanted to support this type of multi-phase intersection support in the
Doc iterators, what would we want it to look like?
{quote}
Then it would not be on DocIdSetIterator. I put it there, to be reasonable in a
backwards compatible way, to try to make filters fast too. But it does not
belong there: so the changes are intended to be minimal and have the least
interference with things like postings lists.
I'll be honest: I don't want to wait until 6.0 to fix the performance of
proximity queries. And to do things properly IMO, Filter needs to be removed
and merged with Query for that to happen. I don't think i can do that easily in
5.x.
{quote}
...and eliminate the need for callers/wrappers to know/care if/when the
implementation supports "aproximations" – instead all callers just check
candidateIsMatch() and we trust the JVM to optimize that call away when it's a
constant.
{quote}
But there are several problems here:
1. the need is actually important. Thats why the two-phase scorer has a
separate array holding the ones it must verify: only the approximate ones! What
particular compiler optimization did you think should help in that case so that
conjunctions don't have performance issues? I develop with the latest 8u40-ea.
2. it makes things much more difficult for a _single java object_ to support
both approximate and exact iteration. Its by far easier to just return
something around some underlying structure that already exists (e.g. underlying
conjunction of a phrase) and not have to support switching back and forth and
so on.
3. I don't think lucene should have a scoring or iteration apis that are
"approximate by definition". That is *by far* too hard to use for the 90% case.
Instead we should have the ability to explicitly "ask" for an approximation in
special cases. Those are: doing zig-zag intersection, document filtering, etc.
> two phase intersection
> ----------------------
>
> Key: LUCENE-6198
> URL: https://issues.apache.org/jira/browse/LUCENE-6198
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Attachments: LUCENE-6198.patch
>
>
> Currently some scorers have to do a lot of per-document work to determine if
> a document is a match. The simplest example is a phrase scorer, but there are
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches
> all odd documents, another that is a phrase matching all even documents.
> Today this conjunction will be very expensive, because the zig-zag
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like
> a conjunction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]