[jira] [Commented] (LUCENE-9613) Create blocks for ords when it helps in Lucene80DocValuesFormat

Adrien Grand (Jira) Wed, 25 Aug 2021 07:10:33 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404486#comment-17404486
 ]


Adrien Grand commented on LUCENE-9613:
--------------------------------------

By removing the wrapping of NumericDocValues by SortedDocValues (see attached 
PR) I get even better numbers (the baseline has the above change, so this 
speedup is on top of the previous one).

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff p-value
         LowSloppyPhrase      138.15      (4.5%)      135.56      (4.0%)   
-1.9% (  -9% -    6%) 0.163
        HighSloppyPhrase       38.34      (5.5%)       37.80      (5.7%)   
-1.4% ( -11% -   10%) 0.432
         MedSloppyPhrase       68.13      (4.2%)       67.27      (3.6%)   
-1.3% (  -8% -    6%) 0.305
             MedSpanNear       97.29      (2.4%)       96.13      (2.8%)   
-1.2% (  -6% -    4%) 0.150
                 Respell      198.32      (5.0%)      195.98      (5.0%)   
-1.2% ( -10% -    9%) 0.456
            HighSpanNear       26.55      (3.7%)       26.24      (6.1%)   
-1.2% ( -10% -    8%) 0.462
             LowSpanNear       15.53      (2.9%)       15.37      (3.6%)   
-1.0% (  -7% -    5%) 0.330
    HighIntervalsOrdered       20.91      (6.2%)       20.73      (5.5%)   
-0.9% ( -11% -   11%) 0.633
                Wildcard      248.72     (13.6%)      246.63     (14.4%)   
-0.8% ( -25% -   31%) 0.849
     LowIntervalsOrdered      241.69      (6.7%)      239.97      (6.4%)   
-0.7% ( -12% -   13%) 0.731
               LowPhrase       54.03      (3.3%)       53.67      (2.9%)   
-0.7% (  -6% -    5%) 0.495
            OrNotHighLow      607.98      (3.2%)      604.28      (3.1%)   
-0.6% (  -6% -    5%) 0.542
     MedIntervalsOrdered       33.22      (3.4%)       33.03      (2.8%)   
-0.6% (  -6% -    5%) 0.562
               MedPhrase      292.37      (3.8%)      290.84      (3.5%)   
-0.5% (  -7% -    6%) 0.648
    HighTermTitleBDVSort       21.76      (2.2%)       21.65      (3.1%)   
-0.5% (  -5% -    4%) 0.564
                HighTerm     2062.70      (4.0%)     2053.13      (4.2%)   
-0.5% (  -8% -    8%) 0.722
               OrHighLow      619.26      (2.9%)      616.86      (2.9%)   
-0.4% (  -5% -    5%) 0.669
              AndHighLow      922.79      (4.8%)      919.30      (4.1%)   
-0.4% (  -8% -    8%) 0.788
                 Prefix3      409.80      (6.5%)      408.28      (6.6%)   
-0.4% ( -12% -   13%) 0.857
            OrHighNotMed     1354.26      (3.8%)     1349.40      (4.2%)   
-0.4% (  -8% -    7%) 0.777
             AndHighHigh       55.31      (4.0%)       55.14      (5.0%)   
-0.3% (  -8% -    9%) 0.838
                  IntNRQ      190.47      (1.0%)      189.99      (0.6%)   
-0.3% (  -1% -    1%) 0.351
              AndHighMed      310.69      (5.0%)      310.04      (5.3%)   
-0.2% (  -9% -   10%) 0.898
              HighPhrase      210.32      (2.2%)      209.90      (1.9%)   
-0.2% (  -4% -    4%) 0.763
              TermDTSort      108.34      (3.2%)      108.15      (3.1%)   
-0.2% (  -6% -    6%) 0.856
            OrNotHighMed     1059.23      (2.8%)     1057.74      (3.4%)   
-0.1% (  -6% -    6%) 0.887
            OrHighNotLow      919.86      (3.1%)      919.37      (3.2%)   
-0.1% (  -6% -    6%) 0.957
                 MedTerm     2131.16      (3.7%)     2140.13      (4.5%)    
0.4% (  -7% -    8%) 0.747
           OrNotHighHigh     1217.26      (3.1%)     1222.56      (3.9%)    
0.4% (  -6% -    7%) 0.698
   HighTermDayOfYearSort       91.07      (7.1%)       91.73      (7.0%)    
0.7% ( -12% -   15%) 0.745
           OrHighNotHigh      924.82      (3.3%)      931.81      (3.6%)    
0.8% (  -5% -    7%) 0.486
                  Fuzzy1       66.97      (5.9%)       67.57      (7.0%)    
0.9% ( -11% -   14%) 0.657
              OrHighHigh       26.63      (3.1%)       26.88      (3.5%)    
0.9% (  -5% -    7%) 0.373
               OrHighMed      100.56      (3.3%)      101.63      (3.4%)    
1.1% (  -5% -    8%) 0.315
                 LowTerm     3005.79      (6.2%)     3044.90      (5.7%)    
1.3% (  -9% -   14%) 0.490
                  Fuzzy2      151.86     (10.1%)      154.03      (9.2%)    
1.4% ( -16% -   23%) 0.642
    BrowseDateTaxoFacets        3.12      (5.6%)        3.17      (3.8%)    
1.8% (  -7% -   11%) 0.235
   BrowseMonthTaxoFacets        3.44      (5.3%)        3.50      (4.3%)    
1.9% (  -7% -   12%) 0.211
BrowseDayOfYearTaxoFacets        3.12      (5.6%)        3.18      (4.0%)    
2.0% (  -7% -   12%) 0.202
       HighTermMonthSort       70.56      (9.8%)       74.99     (10.5%)    
6.3% ( -12% -   29%) 0.051
   BrowseMonthSSDVFacets       14.44      (4.9%)       18.97     (34.9%)   
31.4% (  -8% -   74%) 0.000
BrowseDayOfYearSSDVFacets       14.88      (7.2%)       19.77     (32.7%)   
32.9% (  -6% -   78%) 0.000
{noformat}

The change might be a bit more controversial given that it requires checking 
some of the numeric optimizations, which is why I didn't push it right away.

> Create blocks for ords when it helps in Lucene80DocValuesFormat
> ---------------------------------------------------------------
>
>                 Key: LUCENE-9613
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9613
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: main (9.0)
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently for sorted(-set) values, we always write ords using 
> log2(valueCount) bits per entry. However in several cases like when the field 
> is used in the index sort, or if one value is _very_common, splitting into 
> blocks like we do for numerics would help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9613) Create blocks for ords when it helps in Lucene80DocValuesFormat

Reply via email to