Adrien Grand created LUCENE-6585:
------------------------------------

             Summary: Make ConjunctionDISI flatten sub ConjunctionDISI instances
                 Key: LUCENE-6585
                 URL: https://issues.apache.org/jira/browse/LUCENE-6585
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Adrien Grand
            Priority: Minor


Today ConjunctionDISI wraps some sub (two-phase) iterators. I would like to 
improve it by flattening sub iterators when they implement ConjunctionDISI. In 
practice, this would make "+A +(+B +C)" be executed more like "+A +B +C" (only 
in terms of matching, scoring would not change).

My motivation for this is that if we don't flatten and are unlucky, we can 
sometimes hit some worst cases. For instance consider the 3 following postings 
lists (sorted by increasing cost):

A: 1, 1001, 2001, 3001, ...
C: 0, 2, 4, 6, 8, 10, 12, 14, ...
B: 1, 3, 5, 7, 9, 11, 13, 15, ...

If we run "+A +B +C", then everything works fine, we use A as a lead, and 
advance B 1000 by 1000 to find the next match (if any).

However if we run "+A +(+B +C)", then we would iterate B and C 2 by 2 over the 
entire doc ID space when trying to find the first match which occurs on or 
after A:1.

This is an extreme example which is unlikely to happen in practice, but 
flattening would also help a bit on some more common cases. For instance 
imagine that A, B and C have respective costs of 100, 10 and 1000. If you 
search for "+A +(+B +C)", then we will use the most costly iterator (C) to 
confirm matches of B (the least costly iterator, used as a lead) while it would 
have been more efficient to confirm matches of B with A first, since A is less 
costly than C.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to