[
https://issues.apache.org/jira/browse/LUCENE-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600573#comment-14600573
]
Robert Muir commented on LUCENE-6585:
-------------------------------------
Thanks, I am +1 to commit.
Any "problems" are going to be problems in other scorers not obeying API
contracts and not the logic here, so we don't need to hold up the change on
improving testing for those IMO, but I still want to improve those tests
eventually, it can be a followup. Its just good in general to assert more
things about scoring so its less fragile.
> Make ConjunctionDISI flatten sub ConjunctionDISI instances
> ----------------------------------------------------------
>
> Key: LUCENE-6585
> URL: https://issues.apache.org/jira/browse/LUCENE-6585
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-6585.patch, LUCENE-6585.patch
>
>
> Today ConjunctionDISI wraps some sub (two-phase) iterators. I would like to
> improve it by flattening sub iterators when they implement ConjunctionDISI.
> In practice, this would make "+A +(+B +C)" be executed more like "+A +B +C"
> (only in terms of matching, scoring would not change).
> My motivation for this is that if we don't flatten and are unlucky, we can
> sometimes hit some worst cases. For instance consider the 3 following
> postings lists (sorted by increasing cost):
> A: 1, 1001, 2001, 3001, ...
> C: 0, 2, 4, 6, 8, 10, 12, 14, ...
> B: 1, 3, 5, 7, 9, 11, 13, 15, ...
> If we run "+A +B +C", then everything works fine, we use A as a lead, and
> advance B 1000 by 1000 to find the next match (if any).
> However if we run "+A +(+B +C)", then we would iterate B and C 2 by 2 over
> the entire doc ID space when trying to find the first match which occurs on
> or after A:1.
> This is an extreme example which is unlikely to happen in practice, but
> flattening would also help a bit on some more common cases. For instance
> imagine that A, B and C have respective costs of 100, 10 and 1000. If you
> search for "+A +(+B +C)", then we will use the most costly iterator (C) to
> confirm matches of B (the least costly iterator, used as a lead) while it
> would have been more efficient to confirm matches of B with A first, since A
> is less costly than C.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]