[jira] [Updated] (LUCENE-6179) Remove out-of-order scoring

Adrien Grand (JIRA) Tue, 13 Jan 2015 07:38:25 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-6179:
---------------------------------
    Attachment: bool_or.tasks
                LUCENE-6179.patch

Here is a patch. It removes out-of-order scoring support and makes 
BooleanScorer (aka BS1) score documents in order. The new version is slower 
than current trunk

{code}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                OrLowMed       93.19      (5.9%)       78.33      (4.8%)  
-16.0% ( -25% -   -5%)
                OrLowLow      282.29      (5.0%)      237.59      (4.1%)  
-15.8% ( -23% -   -7%)
             OrLowLowLow      203.94      (7.6%)      172.27      (4.8%)  
-15.5% ( -25% -   -3%)
                OrMedMed       57.48      (6.4%)       49.40      (5.3%)  
-14.1% ( -24% -   -2%)
               OrLowHigh       15.42      (6.4%)       13.63      (5.4%)  
-11.6% ( -21% -    0%)
             OrMedMedMed       37.51      (6.3%)       33.71      (5.6%)  
-10.1% ( -20% -    1%)
               OrMedHigh       18.73      (6.0%)       16.89      (5.4%)   
-9.8% ( -20% -    1%)
              OrHighHigh       10.94      (5.8%)       10.19      (5.7%)   
-6.8% ( -17% -    5%)
                PKLookup      259.62      (0.9%)      258.13      (1.0%)   
-0.6% (  -2% -    1%)
          OrHighHighHigh        7.85      (5.4%)        7.86      (6.6%)    
0.1% ( -11% -   12%)
                 LowTerm      687.79      (3.5%)      694.17      (3.2%)    
0.9% (  -5% -    7%)
                HighTerm       22.81      (5.6%)       23.11      (1.3%)    
1.3% (  -5% -    8%)
                 MedTerm      154.48      (5.2%)      162.79      (2.6%)    
5.4% (  -2% -   13%)
{code}

but still much faster than the Scorer-based BulkScorer:

{code}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                 LowTerm      671.14      (5.0%)      665.62      (3.2%)   
-0.8% (  -8% -    7%)
                OrLowLow      232.81      (6.2%)      231.60      (3.8%)   
-0.5% (  -9% -   10%)
                PKLookup      254.28      (1.2%)      253.31      (0.9%)   
-0.4% (  -2% -    1%)
                HighTerm       19.47      (3.0%)       20.31      (1.1%)    
4.3% (   0% -    8%)
                 MedTerm      149.88      (4.7%)      162.49      (1.8%)    
8.4% (   1% -   15%)
             OrLowLowLow      151.55      (5.2%)      165.27      (5.9%)    
9.1% (  -2% -   21%)
                OrLowMed       67.87      (6.2%)       80.68      (5.6%)   
18.9% (   6% -   32%)
               OrLowHigh        7.97      (5.8%)       10.36      (7.7%)   
29.9% (  15% -   46%)
                OrMedMed       35.97      (5.1%)       48.45      (7.5%)   
34.7% (  20% -   49%)
               OrMedHigh       12.44      (5.6%)       17.01      (8.0%)   
36.7% (  21% -   53%)
              OrHighHigh        5.38      (5.2%)        8.16     (10.2%)   
51.7% (  34% -   70%)
             OrMedMedMed       21.63      (4.8%)       34.07      (8.8%)   
57.5% (  41% -   74%)
          OrHighHighHigh        3.96      (4.5%)        6.78     (12.0%)   
71.2% (  52% -   91%)
{code}

The new BooleanScorer uses a bit set in order to sort documents that matched 
within the window. As Robert pointed out, I don't have CTZ/NTZ support on my 
JVM, so this run of the benchmark is actually a worst-case. It should perform 
better on a newer machine:

{code}
$ java -XX:+PrintFlagsFinal -version | grep Instruction
     intx FenceInstruction                          = 0                         
          {ARCH product}
     bool UseBMI1Instructions                       = false                     
          {ARCH product}
     bool UseCountLeadingZerosInstruction           = false                     
          {ARCH product}
     bool UseCountTrailingZerosInstruction          = false                     
          {ARCH product}
     bool UsePopCountInstruction                    = true                      
          {product}
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
{code}

Some final notes about the patch:
 - it helps remove significant code: {{97 files changed, 411 insertions(+), 
2131 deletions(-)}}
 - since Solr always forces in-order scoring, it would actually make Solr 
faster since it would now use BS1
 - some queries that were lazy to implement scoresDocsOutOfOrder correctly 
might be faster because we won't pick a collector that supports out-of-order 
scoring while documents are actually collected in order (see eg. FilteredQuery)

If someone is interested in reproducing the benchmark, I uploaded the tasks 
file that I used.

> Remove out-of-order scoring
> ---------------------------
>
>                 Key: LUCENE-6179
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6179
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>             Fix For: 5.0, Trunk
>
>         Attachments: LUCENE-6179.patch, bool_or.tasks
>
>
> Out-of-order currently adds complexity that I would like to remove. Here is a 
> selection of issues that come from out-of-order scoring.
>  - lots of specializations with collectors: we have two versions of every top 
> score/field collector depending on whether it should support out-of-order 
> collection or not
>  - it feels like it should be an implementation detail of our bulk scorers 
> but it also makes our APIs more complicated, eg. 
> LeafCollector.acceptsDocsOutOfOrder
>  - if you create a TopFieldCollector, how do you know if you should pass 
> docsScoredInOrder=true or false? To make the decision, you actually need to 
> know whether your query supports out-of-order scoring while the API is on 
> Weight.
> I initially wanted to keep it and improve the decision process in LUCENE-6172 
> but I'm not sure it's the right approach as it would require to make the API 
> even more complicated... hence the suggestion to remove out-of-order scoring 
> completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-6179) Remove out-of-order scoring

Reply via email to