[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Michael McCandless (JIRA) Mon, 11 Aug 2014 02:42:28 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092619#comment-14092619
 ]


Michael McCandless commented on LUCENE-4396:
--------------------------------------------

Thanks Da, this looks like great progress.

bq. In this patch, I have fixed a bug of wrong coord counting.

Is it possible to make a test case showing what the bug was, and
that's fixed (and stays fixed)?

Also, do we have a test case that fails if DAAT and TAAT scoring
differs (as it does on trunk today)?  I know you worked hard /
iterated to get these two to produce precisely the same score, which
is awesome!  I want to make sure we don't regress in the future...

I'm a little worried about the "heavy math" (the matrix) used to
determine which scorer to apply, i.e. it's a little too magical when
you just come across it in the sources.  Can you add a comment to that
part in the code, linking to this issue and explaining the motivation
behind it?  It may also be over-tuned to Wikipedia... but then each of
these boolean scorers should do OK.

+1 to work on the javadocs / comments.  Make sure any now-done TODOs
are removed!

Can I commit TestBooleanUnevenly to trunk today?  Seems like there's
no reason to wait...

I'll run some perf tests on this patch too...


> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf, all.perf, 
> luceneutil-score-equal.patch, luceneutil-score-equal.patch, merge.perf, 
> merge.png, perf.png, stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to