[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Thu, 24 Jul 2014 22:21:07 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074083#comment-14074083
 ]


Da Huang commented on LUCENE-4396:
----------------------------------

{quote}
Do we really need a separate class to make the decision about which scorer to 
use? Seems like the added logic for when to use BNS is fairly small so we could 
just add it into BQ's scorer method instead?
{quote}
OK, I will move the decision logic back to BQ.

{quote}
For bulkScorer, should we ever return BooleanScorer even when there are 
required clauses? Or was that just commented out for temporary benchmarking so 
we'd wrap BNS? When there is a required clause, if BNS is never slower than BS, 
then instead of falling back to super.bulkScorer we could do the wrapping 
ourselves there? Just to make it clearer we are using BNS ... or maybe just put 
a comment saying so (replacing that TODO).
{quote}
BooleanScorer should be applied for bulkScorer under some cases. Now I turn to 
super.bulkScorer when there are required clauses is just a temporary strategy.
See the following tables.
{code}
                Task  ArrayNotDel           BS       BitSet           ll        
 llbs        size5        size8        size9
  HighAndSomeHighNot         0.7         15.3*         7.4          8.9         
 2.0          6.6         10.0          3.4 
   HighAndSomeHighOr        13.3         24.5*         7.8          9.1         
10.9         17.3+        18.3+        21.3+
   HighAndSomeLowNot       -45.1        -53.9        -55.0        -57.3        
-45.5        -47.8        -42.2        -41.5 
    HighAndSomeLowOr       -44.7        -55.4        -51.2        -58.1        
-54.5        -47.9        -39.7        -44.9 
  HighAndTonsHighNot       475.7+       472.7+       507.0+       552.9+       
627.9*       149.1        144.7        143.7 
   HighAndTonsHighOr       141.0+       135.4+       162.4+       153.4+       
169.7*       154.0+       150.0+       149.1+
   HighAndTonsLowNot       -49.9        -66.2        -46.8        -76.9        
-30.3        -73.7        -28.6        -15.6 
    HighAndTonsLowOr       -22.4        -69.4        -30.2        -67.5        
-41.9        -63.8        -24.4        -13.9 
   LowAndSomeHighNot         3.7         -2.6         -9.0         -7.3         
-6.2          4.5+         6.2*         4.7+
    LowAndSomeHighOr         1.5        -14.0        -15.5        -10.8        
-12.0          6.8*         5.8+         6.6+
    LowAndSomeLowNot       -26.4        -43.7        -56.5        -47.3        
-43.7          3.7*        -2.3         -4.0 
     LowAndSomeLowOr       -23.2        -41.8        -60.5        -46.2        
-43.4          2.2*        -2.3         -8.8 
   LowAndTonsHighNot       380.6+       171.5        118.4        248.3        
381.8*        22.5         23.8         26.5 
    LowAndTonsHighOr        29.8*         5.2         -1.1         10.7         
 5.4         24.2+        27.5+        28.2+
    LowAndTonsLowNot        28.9          9.1        -39.3          5.3         
 1.3         39.1+        47.2*        44.3+
     LowAndTonsLowOr        30.9+         7.2        -38.1          0.5         
 9.0         29.9+        40.9*        38.1+

                Task         Good Method
  HighAndSomeHighNot       BS, 
   HighAndSomeHighOr       BS, size9, size8, size5, 
   HighAndSomeLowNot       
    HighAndSomeLowOr       
  HighAndTonsHighNot       llbs, ll, BitSet, ArrayNotDel, BS, 
   HighAndTonsHighOr       llbs, BitSet, size5, ll, size8, size9, ArrayNotDel, 
BS, 
   HighAndTonsLowNot       
    HighAndTonsLowOr       
   LowAndSomeHighNot       size8, size9, size5, 
    LowAndSomeHighOr       size5, size9, size8, 
    LowAndSomeLowNot       size5, 
     LowAndSomeLowOr       size5, 
   LowAndTonsHighNot       llbs, ArrayNotDel, 
    LowAndTonsHighOr       ArrayNotDel, size9, size8, size5, 
    LowAndTonsLowNot       size8, size9, size5, 
     LowAndTonsLowOr       size8, size9, ArrayNotDel, size5, 
{code}
BS perferms the best for HighAndSomeHigh* cases.

{quote}
For the rules on when to use which scorer, it seems like we should take the 
.cost() of the sub-clauses into account somehow...
{quote}
I have already take .cost() into account see the rules in the decider.
{code}
    if (!required.isEmpty() && optional.size() > 3) {
      float times = (float) required.get(0).cost() / optional.get(0).cost();
      if (times < 1) return new BooleanNovelScorer(weight, disableCoord, 
minShouldMatch, required, optional, prohibited, maxCoord);
    }   
    if (!required.isEmpty() && prohibited.size() > 3) {
      float times = (float) required.get(0).cost() / prohibited.get(0).cost();
      if (times < 1) return new BooleanNovelScorer(weight, disableCoord, 
minShouldMatch, required, optional, prohibited, maxCoord);
    }   
{code}
Here, I just take the first scorer's cost into account, as it may cost a lot to 
iterate all scorers.

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch, 
> luceneutil-score-equal.patch, stat.cpp, stat.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to