[
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084692#comment-14084692
]
Michael McCandless commented on LUCENE-4396:
--------------------------------------------
Thanks Da, this looks like great progress.
Just to sum things up a bit here:
* Both BooleanArrayScorer and BooleanLinkedScorer (which are Scorers
not BulkScorers) can only be used when there's at least one MUST
clause in the BooleanQuery.
* BooleanArrayScorer grabs the next SIZE (256 now) hits from the
MUST clauses, and then folds in the MUST_NOT and SHOULD.
* BooleanLinkedScorer, like BooleanScorer, matches/cores in windows
of 2048 docIDs at once, but it uses a bitSet (and also the linked
list) to track filled bucket slots.
* BooleanScorer now can also handle MUST clauses
It's nice that you're careful to do the math and double/float casting
in the same order as BS2 so the scores match.
It's a bit spooky that collectMore recurses on itself; in theory
there's an adversary that could consume quite a bit of stack right?
Can we refactor that to the equivalent while loop (it's "just" tail
recursion).
Unfortunately the logic for picking which scorer to use looks really
complex; hopefully we can simplify it.
Also, do we really need 3 scorer classes (BS, BAS, BLS) for the
non-DAAT case? Ie, does each really provide a compelling situation
where it's better than the others? It's not great adding so much
complexity for performance gains of unusual (so many clauses) boolean
queries...
> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch,
> LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch,
> luceneutil-score-equal.patch, stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared
> to the other clauses, that BooleanScorer would perform better than
> BooleanScorer2. BooleanScorer still has some vestiges from when it used to
> handle MUST so it shouldn't be hard to bring back this capability ... I think
> the challenging part might be the heuristics on when to use which (likely we
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you
> are inspired!
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]