[ 
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292577#comment-14292577
 ] 

Mikhail Khludnev commented on LUCENE-6198:
------------------------------------------

Robert broke my heart twice. So far this thread seems like a pretty graveyard 
for the whole problem family. Let me add my humble proposal.
"verification" can be achieved by just introducing "advanceOrAbandon()" that's 
can be provided via DISI (ie Scorer) implementing Bits, and redefining 
get(int):boolean as stateful. 
Thus, high cost() scorer, which are plagued by endless loops in advance() can 
emphasize it by implementing Bits, let's call them _slowpoke_.
Now ConjunctionScorer can let least cost scorer to lead the leapfrog 
intersection of regular scorers, which are not implementing Bits. And those 
which implements Bits (slowpokes) just confirms the candidate doc by get(), and 
they don't need to loop!
So far it's good. However, when high cost regular scorer (e.g. plain 
disjunction of stopwords) is intersected with slowpoke (e.g. empty zig-zag 
conjunction or minShouldMatch pre 4.3). In this case it won't work fine - 
disjunction proposes almost everything for confirmation, and slowpoke reject 
all of them. ConjunctionScorer should count such rejections (those are not 
expensive, btw), and if it happens too many times, it can let slowpoke to lead 
the leapfrog, by calling nextDoc() on it, and this loop can be reasonable. 
How does it fit?  


> two phase intersection
> ----------------------
>
>                 Key: LUCENE-6198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6198
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-6198.patch
>
>
> Currently some scorers have to do a lot of per-document work to determine if 
> a document is a match. The simplest example is a phrase scorer, but there are 
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches 
> all odd documents, another that is a phrase matching all even documents. 
> Today this conjunction will be very expensive, because the zig-zag 
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like 
> a conjunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to