[jira] [Comment Edited] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Tue, 03 Jun 2014 06:26:31 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016491#comment-14016491
 ]


Da Huang edited comment on LUCENE-4396 at 6/3/14 1:24 PM:
----------------------------------------------------------

A patch based on lucene github mirror commit 
cf10341825ff6bd1662dd48c51926bc51d751ce5.

I use a bitset to skip required docs when scaning optional and prohibited docs. 
The perf. comparison is at the bottom.

Besides, I build a new tasks file the test the perf. and I discover that BNS 
optimize the "+a -b -c -d ..." case a lot, when "b c d ..." hits many docs.

{code}
BNS (without bitset) vs. BS2
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
       HighAndTonsLowNot        4.29      (2.9%)        1.08      (0.6%)  
-74.8% ( -76% -  -73%)
        HighAndTonsLowOr        4.87      (6.4%)        1.24      (1.0%)  
-74.4% ( -76% -  -71%)
       HighAndSomeLowNot        9.03      (5.2%)        4.11      (4.1%)  
-54.4% ( -60% -  -47%)
        HighAndSomeLowOr       16.21      (9.6%)        7.75      (4.1%)  
-52.2% ( -60% -  -42%)
         LowAndSomeLowOr      303.28      (2.4%)      183.14      (6.6%)  
-39.6% ( -47% -  -31%)
        LowAndSomeLowNot      257.24      (1.8%)      157.07      (6.5%)  
-38.9% ( -46% -  -31%)
        LowAndSomeHighOr       36.78      (1.9%)       33.74      (3.0%)   
-8.3% ( -12% -   -3%)
        LowAndTonsLowNot       21.28      (2.0%)       19.69      (6.9%)   
-7.5% ( -16% -    1%)
       LowAndSomeHighNot       34.40      (1.6%)       33.69      (3.2%)   
-2.1% (  -6% -    2%)
                PKLookup      100.63      (4.8%)      103.46      (4.7%)    
2.8% (  -6% -   12%)
        LowAndTonsHighOr        1.26      (1.6%)        1.41      (1.7%)   
11.8% (   8% -   15%)
         LowAndTonsLowOr       13.66      (0.9%)       15.50      (6.0%)   
13.5% (   6% -   20%)
      HighAndSomeHighNot        2.65      (1.4%)        3.12      (6.5%)   
17.6% (   9% -   25%)
       HighAndSomeHighOr        2.21      (2.4%)        2.62      (5.8%)   
18.6% (  10% -   27%)
       HighAndTonsHighOr        0.07      (0.8%)        0.19     (10.5%)  
160.3% ( 147% -  172%)
       LowAndTonsHighNot        2.86      (1.6%)       10.24     (18.1%)  
257.7% ( 234% -  281%)
      HighAndTonsHighNot        0.05      (0.8%)        0.40     (28.2%)  
641.8% ( 607% -  676%)
      

BS vs. BS2
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
        HighAndTonsLowOr        4.02      (6.8%)        0.87      (0.5%)  
-78.2% ( -80% -  -76%)
       HighAndTonsLowNot        4.95      (3.4%)        1.29      (0.9%)  
-73.9% ( -75% -  -72%)
        HighAndSomeLowOr       14.45      (9.5%)        6.68      (3.7%)  
-53.8% ( -61% -  -44%)
       HighAndSomeLowNot       14.78      (5.1%)        7.48      (3.9%)  
-49.4% ( -55% -  -42%)
         LowAndSomeLowOr      316.55      (2.2%)      170.14      (5.6%)  
-46.3% ( -52% -  -39%)
        LowAndSomeLowNot      283.47      (1.7%)      157.35      (6.0%)  
-44.5% ( -51% -  -37%)
        LowAndSomeHighOr       39.39      (2.0%)       35.07      (3.1%)  
-11.0% ( -15% -   -6%)
       LowAndSomeHighNot       53.96      (2.0%)       48.57      (3.8%)  
-10.0% ( -15% -   -4%)
        LowAndTonsLowNot       17.97      (1.5%)       17.04      (6.0%)   
-5.2% ( -12% -    2%)
                PKLookup       97.57      (2.7%)      100.21      (5.2%)    
2.7% (  -5% -   10%)
        LowAndTonsHighOr        3.59      (1.7%)        3.74      (2.4%)    
4.1% (   0% -    8%)
         LowAndTonsLowOr       14.71      (1.3%)       15.63      (5.7%)    
6.3% (   0% -   13%)
      HighAndSomeHighNot        1.84      (1.3%)        2.05      (5.6%)   
11.2% (   4% -   18%)
       HighAndSomeHighOr        1.93      (2.1%)        2.16      (5.6%)   
11.9% (   4% -   20%)
       HighAndTonsHighOr        0.05      (1.0%)        0.13     (14.1%)  
144.8% ( 128% -  161%)
       LowAndTonsHighNot        1.63      (1.9%)        4.95      (7.2%)  
204.0% ( 191% -  217%)
      HighAndTonsHighNot        0.06      (1.0%)        0.34     (18.2%)  
459.6% ( 435% -  483%)


BNS (with bitset) vs. BS2
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
        HighAndSomeLowOr        7.45     (12.0%)        3.49      (6.6%)  
-53.1% ( -64% -  -39%)
       HighAndSomeLowNot       10.45      (8.0%)        5.25      (6.8%)  
-49.7% ( -59% -  -37%)
         LowAndSomeLowOr      310.53      (2.3%)      168.56      (5.8%)  
-45.7% ( -52% -  -38%)
        LowAndSomeLowNot      292.05      (2.3%)      165.88      (5.7%)  
-43.2% ( -50% -  -36%)
       HighAndTonsLowNot        5.94      (3.5%)        4.33      (6.8%)  
-27.0% ( -36% -  -17%)
        HighAndTonsLowOr        5.92      (4.4%)        4.39      (6.0%)  
-25.9% ( -34% -  -16%)
       LowAndSomeHighNot       53.79      (2.4%)       47.71      (2.8%)  
-11.3% ( -16% -   -6%)
        LowAndSomeHighOr       31.03      (2.6%)       28.20      (2.4%)   
-9.1% ( -13% -   -4%)
         LowAndTonsLowOr       18.58      (1.1%)       17.60      (6.2%)   
-5.3% ( -12% -    2%)
      HighAndSomeHighNot        1.49      (1.8%)        1.44      (8.9%)   
-3.5% ( -13% -    7%)
                PKLookup       96.96      (3.4%)      100.03      (5.1%)    
3.2% (  -5% -   12%)
        LowAndTonsHighOr        2.06      (2.2%)        2.18      (2.3%)    
5.9% (   1% -   10%)
        LowAndTonsLowNot       13.63      (1.3%)       14.57      (6.3%)    
6.9% (   0% -   14%)
       HighAndSomeHighOr        2.03      (2.4%)        2.33      (8.1%)   
14.5% (   3% -   25%)
       HighAndTonsHighOr        0.07      (0.8%)        0.17     (13.6%)  
158.2% ( 142% -  174%)
       LowAndTonsHighNot        1.40      (2.2%)        6.21     (11.3%)  
344.2% ( 323% -  365%)
      HighAndTonsHighNot        0.07      (1.1%)        0.46     (24.2%)  
572.1% ( 540% -  604%)
      
      
{code}





was (Author: dhuang):
A patch based on lucene github mirror commit 
cf10341825ff6bd1662dd48c51926bc51d751ce5.

I use a bitset to skip required docs when scaning optional and prohibited docs. 
The perf. comparison is at the bottom.

Besides, I build a new tasks file the test the perf. and I discover that BNS 
optimize the "+a -b -c -d ..." case a lot, when "b c d ..." hits many docs.

<code>
BNS (without bitset) vs. BS2
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
       HighAndTonsLowNot        4.29      (2.9%)        1.08      (0.6%)  
-74.8% ( -76% -  -73%)
        HighAndTonsLowOr        4.87      (6.4%)        1.24      (1.0%)  
-74.4% ( -76% -  -71%)
       HighAndSomeLowNot        9.03      (5.2%)        4.11      (4.1%)  
-54.4% ( -60% -  -47%)
        HighAndSomeLowOr       16.21      (9.6%)        7.75      (4.1%)  
-52.2% ( -60% -  -42%)
         LowAndSomeLowOr      303.28      (2.4%)      183.14      (6.6%)  
-39.6% ( -47% -  -31%)
        LowAndSomeLowNot      257.24      (1.8%)      157.07      (6.5%)  
-38.9% ( -46% -  -31%)
        LowAndSomeHighOr       36.78      (1.9%)       33.74      (3.0%)   
-8.3% ( -12% -   -3%)
        LowAndTonsLowNot       21.28      (2.0%)       19.69      (6.9%)   
-7.5% ( -16% -    1%)
       LowAndSomeHighNot       34.40      (1.6%)       33.69      (3.2%)   
-2.1% (  -6% -    2%)
                PKLookup      100.63      (4.8%)      103.46      (4.7%)    
2.8% (  -6% -   12%)
        LowAndTonsHighOr        1.26      (1.6%)        1.41      (1.7%)   
11.8% (   8% -   15%)
         LowAndTonsLowOr       13.66      (0.9%)       15.50      (6.0%)   
13.5% (   6% -   20%)
      HighAndSomeHighNot        2.65      (1.4%)        3.12      (6.5%)   
17.6% (   9% -   25%)
       HighAndSomeHighOr        2.21      (2.4%)        2.62      (5.8%)   
18.6% (  10% -   27%)
       HighAndTonsHighOr        0.07      (0.8%)        0.19     (10.5%)  
160.3% ( 147% -  172%)
       LowAndTonsHighNot        2.86      (1.6%)       10.24     (18.1%)  
257.7% ( 234% -  281%)
      HighAndTonsHighNot        0.05      (0.8%)        0.40     (28.2%)  
641.8% ( 607% -  676%)
      

BS vs. BS2
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
        HighAndTonsLowOr        4.02      (6.8%)        0.87      (0.5%)  
-78.2% ( -80% -  -76%)
       HighAndTonsLowNot        4.95      (3.4%)        1.29      (0.9%)  
-73.9% ( -75% -  -72%)
        HighAndSomeLowOr       14.45      (9.5%)        6.68      (3.7%)  
-53.8% ( -61% -  -44%)
       HighAndSomeLowNot       14.78      (5.1%)        7.48      (3.9%)  
-49.4% ( -55% -  -42%)
         LowAndSomeLowOr      316.55      (2.2%)      170.14      (5.6%)  
-46.3% ( -52% -  -39%)
        LowAndSomeLowNot      283.47      (1.7%)      157.35      (6.0%)  
-44.5% ( -51% -  -37%)
        LowAndSomeHighOr       39.39      (2.0%)       35.07      (3.1%)  
-11.0% ( -15% -   -6%)
       LowAndSomeHighNot       53.96      (2.0%)       48.57      (3.8%)  
-10.0% ( -15% -   -4%)
        LowAndTonsLowNot       17.97      (1.5%)       17.04      (6.0%)   
-5.2% ( -12% -    2%)
                PKLookup       97.57      (2.7%)      100.21      (5.2%)    
2.7% (  -5% -   10%)
        LowAndTonsHighOr        3.59      (1.7%)        3.74      (2.4%)    
4.1% (   0% -    8%)
         LowAndTonsLowOr       14.71      (1.3%)       15.63      (5.7%)    
6.3% (   0% -   13%)
      HighAndSomeHighNot        1.84      (1.3%)        2.05      (5.6%)   
11.2% (   4% -   18%)
       HighAndSomeHighOr        1.93      (2.1%)        2.16      (5.6%)   
11.9% (   4% -   20%)
       HighAndTonsHighOr        0.05      (1.0%)        0.13     (14.1%)  
144.8% ( 128% -  161%)
       LowAndTonsHighNot        1.63      (1.9%)        4.95      (7.2%)  
204.0% ( 191% -  217%)
      HighAndTonsHighNot        0.06      (1.0%)        0.34     (18.2%)  
459.6% ( 435% -  483%)


BNS (with bitset) vs. BS2
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
        HighAndSomeLowOr        7.45     (12.0%)        3.49      (6.6%)  
-53.1% ( -64% -  -39%)
       HighAndSomeLowNot       10.45      (8.0%)        5.25      (6.8%)  
-49.7% ( -59% -  -37%)
         LowAndSomeLowOr      310.53      (2.3%)      168.56      (5.8%)  
-45.7% ( -52% -  -38%)
        LowAndSomeLowNot      292.05      (2.3%)      165.88      (5.7%)  
-43.2% ( -50% -  -36%)
       HighAndTonsLowNot        5.94      (3.5%)        4.33      (6.8%)  
-27.0% ( -36% -  -17%)
        HighAndTonsLowOr        5.92      (4.4%)        4.39      (6.0%)  
-25.9% ( -34% -  -16%)
       LowAndSomeHighNot       53.79      (2.4%)       47.71      (2.8%)  
-11.3% ( -16% -   -6%)
        LowAndSomeHighOr       31.03      (2.6%)       28.20      (2.4%)   
-9.1% ( -13% -   -4%)
         LowAndTonsLowOr       18.58      (1.1%)       17.60      (6.2%)   
-5.3% ( -12% -    2%)
      HighAndSomeHighNot        1.49      (1.8%)        1.44      (8.9%)   
-3.5% ( -13% -    7%)
                PKLookup       96.96      (3.4%)      100.03      (5.1%)    
3.2% (  -5% -   12%)
        LowAndTonsHighOr        2.06      (2.2%)        2.18      (2.3%)    
5.9% (   1% -   10%)
        LowAndTonsLowNot       13.63      (1.3%)       14.57      (6.3%)    
6.9% (   0% -   14%)
       HighAndSomeHighOr        2.03      (2.4%)        2.33      (8.1%)   
14.5% (   3% -   25%)
       HighAndTonsHighOr        0.07      (0.8%)        0.17     (13.6%)  
158.2% ( 142% -  174%)
       LowAndTonsHighNot        1.40      (2.2%)        6.21     (11.3%)  
344.2% ( 323% -  365%)
      HighAndTonsHighNot        0.07      (1.1%)        0.46     (24.2%)  
572.1% ( 540% -  604%)
      
      
</code>




> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to