[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Michael McCandless (JIRA) Wed, 04 Jun 2014 03:35:29 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017605#comment-14017605
 ]


Michael McCandless commented on LUCENE-4396:
--------------------------------------------

Thanks Da!

When you say "BNS (without bitset) vs. BS2" that means baseline=BS2
and my_version=BNS (without bitset)?  I just want to make sure I have
the direction right!

With the added bitset, couldn't you not use a linked list anymore?
Ie, just use prev/nextSetBit.  I wonder if the bitset (instead of the
linked list) could also help BooleanScorer?  Maybe test this change
separately (e.g. just modify BS we have today on trunk) to see if it
helps or hurts... if it does help, it seems like BNS could be
used (or BS could be a Scorer not a BulkScorer) even when there are no
MUST clauses?  Ie, the bitset lets us easily keep the order.  Then we
can merge BS/BNS into one?

Could you attach all new tasks as a single file in general?  Note that
when you set up a luceneutil test, you can add a task filter using
addTaskPattern, so you run just a subset of the tasks for that one
test.

Strange that the scores are still different between BS/BS2 and BNS/BS2
when using double.

If there's only 1 required clause sent to BS/BNS can't we use its scorer
instead?

Have you explored having BS interact directly with all the MUST
clauses, rather than using ConjunctionScorer?

Because we have wildly divergent results (sometimes one is much
faster, other times it's much slower) we will somehow need to add
logic to pick the right scorer for each query.  But we can defer this
until we're "doneish" iterating the changes to each scorer... it can
come later on.


> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to