[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Michael McCandless (JIRA) Mon, 04 Aug 2014 07:19:32 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084692#comment-14084692
 ]


Michael McCandless commented on LUCENE-4396:
--------------------------------------------

Thanks Da, this looks like great progress.

Just to sum things up a bit here:

  * Both BooleanArrayScorer and BooleanLinkedScorer (which are Scorers
    not BulkScorers) can only be used when there's at least one MUST
    clause in the BooleanQuery.

  * BooleanArrayScorer grabs the next SIZE (256 now) hits from the
    MUST clauses, and then folds in the MUST_NOT and SHOULD.

  * BooleanLinkedScorer, like BooleanScorer, matches/cores in windows
    of 2048 docIDs at once, but it uses a bitSet (and also the linked
    list) to track filled bucket slots.

  * BooleanScorer now can also handle MUST clauses

It's nice that you're careful to do the math and double/float casting
in the same order as BS2 so the scores match.

It's a bit spooky that collectMore recurses on itself; in theory
there's an adversary that could consume quite a bit of stack right?
Can we refactor that to the equivalent while loop (it's "just" tail
recursion).

Unfortunately the logic for picking which scorer to use looks really
complex; hopefully we can simplify it.

Also, do we really need 3 scorer classes (BS, BAS, BLS) for the
non-DAAT case?  Ie, does each really provide a compelling situation
where it's better than the others?  It's not great adding so much
complexity for performance gains of unusual (so many clauses) boolean
queries...


> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch, 
> luceneutil-score-equal.patch, stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to