[jira] [Created] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues

Feng Guo (Jira) Mon, 20 Dec 2021 20:20:08 -0800

Feng Guo created LUCENE-10333:
---------------------------------

             Summary: Speed up BinaryDocValues with a batch reading on 
LongValues
                 Key: LUCENE-10333
                 URL: https://issues.apache.org/jira/browse/LUCENE-10333
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/codecs
            Reporter: Feng Guo



*Description*
In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as 
{{SortedNumericDocValues}} not in singleton case) has code patterns like this:
{code:java}
long startOffset = addresses.get(doc);
bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
{code}
This means we need to read 2 longs stored together. We could probably push down 
this info to {{LongValues}} and read 2 values together in one call. I think 
this can make sense because these codes could be rather hot.

*Benchmark*

In today's LuceneUtil benchmark, all results looks even. I suspect this is 
because we do not use {{BinaryDocValues}} any more in tasks. So i tried to roll 
back the baseline and candidate to a stale code version (before 
https://issues.apache.org/jira/browse/LUCENE-10062), we used 
{{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been 
seen a QPS increasing there. (This is tricky, i wonder if we can have a more 
official way to benchmark BinaryDocValues by chaging some params or add some 
tasks?) Anyway, I believe It is still worth optimizing {{BinarayDocValue}} 
though facets do not use it any more :)

*Benchmark result on stale code version where taxonomy ordinals are stored in 
BinaryDocvalues (to justivy a speed up in BinaryDocValues)*
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
           BrowseMonthSSDVFacets       17.25      (8.6%)       16.78     
(17.8%)   -2.7% ( -26% -   25%) 0.536
                         LowTerm     1458.66      (3.6%)     1438.15      
(4.4%)   -1.4% (  -9% -    6%) 0.268
           HighTermDayOfYearSort      108.55     (10.0%)      108.04      
(9.1%)   -0.5% ( -17% -   20%) 0.874
                      HighPhrase      168.65      (1.9%)      168.06      
(2.3%)   -0.3% (  -4% -    3%) 0.602
                    OrNotHighLow     1201.79      (3.4%)     1197.93      
(4.6%)   -0.3% (  -8% -    7%) 0.801
                    HighSpanNear       15.26      (1.6%)       15.21      
(1.4%)   -0.3% (  -3% -    2%) 0.499
                         Respell       62.61      (1.8%)       62.45      
(1.9%)   -0.3% (  -3% -    3%) 0.649
                       MedPhrase       57.57      (1.4%)       57.44      
(1.8%)   -0.2% (  -3% -    2%) 0.648
                       OrHighMed      129.10      (3.0%)      128.83      
(3.1%)   -0.2% (  -6% -    6%) 0.830
                     MedSpanNear       19.45      (2.3%)       19.41      
(2.2%)   -0.2% (  -4% -    4%) 0.784
                      OrHighHigh       34.85      (1.5%)       34.79      
(1.4%)   -0.2% (  -3% -    2%) 0.722
            HighIntervalsOrdered       26.92      (4.7%)       26.89      
(4.9%)   -0.1% (  -9% -    9%) 0.929
                          IntNRQ      343.52      (1.6%)      343.16      
(2.0%)   -0.1% (  -3% -    3%) 0.855
                   OrHighNotHigh      595.61      (3.2%)      595.10      
(4.3%)   -0.1% (  -7% -    7%) 0.944
             MedIntervalsOrdered       17.66      (3.6%)       17.65      
(3.8%)   -0.1% (  -7% -    7%) 0.961
             LowIntervalsOrdered      109.23      (3.3%)      109.18      
(3.5%)   -0.0% (  -6% -    7%) 0.969
                     AndHighHigh       81.09      (1.5%)       81.10      
(2.0%)    0.0% (  -3% -    3%) 0.967
                     LowSpanNear      203.33      (2.1%)      203.41      
(1.8%)    0.0% (  -3% -    3%) 0.948
                 MedSloppyPhrase       27.15      (1.5%)       27.17      
(1.2%)    0.1% (  -2% -    2%) 0.907
                       LowPhrase       75.76      (1.8%)       75.81      
(2.0%)    0.1% (  -3% -    3%) 0.904
         AndHighMedDayTaxoFacets       97.27      (1.9%)       97.35      
(1.9%)    0.1% (  -3% -    4%) 0.888
                HighSloppyPhrase       14.32      (2.7%)       14.34      
(1.8%)    0.1% (  -4% -    4%) 0.870
                          Fuzzy2       76.00      (3.9%)       76.12      
(3.4%)    0.2% (  -6% -    7%) 0.894
                        Wildcard      123.51      (1.8%)      123.71      
(2.1%)    0.2% (  -3% -    4%) 0.796
                    OrHighNotLow      722.64      (4.4%)      724.15      
(5.4%)    0.2% (  -9% -   10%) 0.894
                      AndHighLow      929.73      (4.0%)      931.75      
(3.8%)    0.2% (  -7% -    8%) 0.859
                         Prefix3      240.13      (1.5%)      240.69      
(1.9%)    0.2% (  -3% -    3%) 0.675
                      AndHighMed      210.17      (1.7%)      210.84      
(1.6%)    0.3% (  -2% -    3%) 0.532
                 LowSloppyPhrase      142.83      (1.8%)      143.54      
(2.0%)    0.5% (  -3% -    4%) 0.410
                    OrNotHighMed      709.24      (4.4%)      712.78      
(4.3%)    0.5% (  -7% -    9%) 0.715
                          Fuzzy1       85.33      (5.7%)       85.77      
(6.3%)    0.5% ( -10% -   13%) 0.786
                         MedTerm     1466.50      (3.5%)     1474.85      
(3.9%)    0.6% (  -6% -    8%) 0.629
                      TermDTSort      105.51      (7.7%)      106.33      
(7.3%)    0.8% ( -13% -   17%) 0.746
                        PKLookup      206.18      (2.9%)      208.68      
(2.9%)    1.2% (  -4% -    7%) 0.179
                    OrHighNotMed      876.71      (3.0%)      887.84      
(3.9%)    1.3% (  -5% -    8%) 0.251
                   OrNotHighHigh      774.25      (4.7%)      785.03      
(6.0%)    1.4% (  -8% -   12%) 0.411
               HighTermMonthSort       74.33      (9.4%)       75.47     
(16.3%)    1.5% ( -22% -   30%) 0.716
                       OrHighLow      518.73      (5.2%)      528.27      
(5.4%)    1.8% (  -8% -   13%) 0.272
                        HighTerm     1892.16      (3.4%)     1934.63      
(5.5%)    2.2% (  -6% -   11%) 0.120
        AndHighHighDayTaxoFacets       16.46      (2.7%)       16.84      
(2.3%)    2.3% (  -2% -    7%) 0.004
            HighTermTitleBDVSort      141.39     (14.6%)      145.33     
(15.1%)    2.8% ( -23% -   38%) 0.554
            MedTermDayTaxoFacets       27.81      (2.1%)       29.54      
(2.3%)    6.2% (   1% -   10%) 0.000
          OrHighMedDayTaxoFacets        3.05      (1.9%)        3.30      
(2.2%)    8.3% (   4% -   12%) 0.000
       BrowseDayOfYearSSDVFacets       17.36     (13.0%)       18.97     
(15.8%)    9.3% ( -17% -   43%) 0.042
       BrowseDayOfYearTaxoFacets        3.02      (3.6%)        3.79      
(2.5%)   25.4% (  18% -   32%) 0.000
            BrowseDateTaxoFacets        3.01      (3.6%)        3.79      
(2.5%)   25.6% (  18% -   32%) 0.000
           BrowseMonthTaxoFacets        3.14      (2.1%)        3.99      
(2.5%)   27.0% (  21% -   32%) 0.000
{code}
*newest code version*
{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                      TermDTSort      129.74     (10.9%)      127.83     
(11.3%)   -1.5% ( -21% -   23%) 0.675
                        HighTerm     1182.13      (5.1%)     1172.76      
(6.5%)   -0.8% ( -11% -   11%) 0.668
                    HighSpanNear        7.99      (4.2%)        7.96      
(4.2%)   -0.3% (  -8% -    8%) 0.816
            HighIntervalsOrdered       17.86      (2.1%)       17.85      
(2.3%)   -0.1% (  -4% -    4%) 0.927
            BrowseDateTaxoFacets       19.61     (17.2%)       19.61     
(17.4%)   -0.0% ( -29% -   41%) 0.995
                   OrNotHighHigh      619.85      (4.3%)      619.72      
(8.6%)   -0.0% ( -12% -   13%) 0.992
                        PKLookup      202.14      (5.6%)      202.11      
(4.4%)   -0.0% (  -9% -   10%) 0.994
             LowIntervalsOrdered       25.53      (1.5%)       25.53      
(1.6%)    0.0% (  -3% -    3%) 1.000
       BrowseDayOfYearSSDVFacets       14.27      (2.7%)       14.28      
(2.7%)    0.0% (  -5% -    5%) 0.965
             MedIntervalsOrdered       47.33      (1.9%)       47.34      
(2.0%)    0.0% (  -3% -    3%) 0.947
     BrowseRandomLabelSSDVFacets       10.25      (2.4%)       10.26      
(2.4%)    0.1% (  -4% -    4%) 0.935
           BrowseMonthSSDVFacets       15.66      (3.0%)       15.67      
(3.0%)    0.1% (  -5% -    6%) 0.945
                 MedSloppyPhrase       11.97      (1.7%)       11.98      
(1.9%)    0.1% (  -3% -    3%) 0.840
                        Wildcard       25.71      (2.6%)       25.75      
(2.4%)    0.1% (  -4% -    5%) 0.875
                       MedPhrase       33.62      (2.5%)       33.68      
(2.6%)    0.2% (  -4% -    5%) 0.802
           HighTermDayOfYearSort       80.58     (11.0%)       80.76     
(10.6%)    0.2% ( -19% -   24%) 0.949
            HighTermTitleBDVSort      130.43     (11.7%)      130.73     
(10.7%)    0.2% ( -19% -   25%) 0.947
        AndHighHighDayTaxoFacets       32.25      (3.0%)       32.33      
(2.9%)    0.2% (  -5% -    6%) 0.796
                 LowSloppyPhrase       39.50      (1.7%)       39.61      
(1.4%)    0.3% (  -2% -    3%) 0.586
                         Prefix3      127.42      (3.8%)      127.77      
(3.4%)    0.3% (  -6% -    7%) 0.812
               HighTermMonthSort      117.65      (8.4%)      117.98      
(8.1%)    0.3% ( -14% -   18%) 0.915
                HighSloppyPhrase       14.47      (1.8%)       14.51      
(2.2%)    0.3% (  -3% -    4%) 0.647
                     MedSpanNear       48.78      (2.2%)       48.93      
(2.0%)    0.3% (  -3% -    4%) 0.640
          OrHighMedDayTaxoFacets       13.42      (3.7%)       13.48      
(3.6%)    0.4% (  -6% -    7%) 0.730
         AndHighMedDayTaxoFacets       37.90      (3.0%)       38.05      
(3.4%)    0.4% (  -5% -    7%) 0.694
                          Fuzzy1       83.31      (3.9%)       83.70      
(4.9%)    0.5% (  -7% -    9%) 0.738
                         Respell       49.74      (1.3%)       50.00      
(1.5%)    0.5% (  -2% -    3%) 0.254
                       OrHighLow      531.57      (8.0%)      534.83      
(6.7%)    0.6% ( -13% -   16%) 0.792
                     AndHighHigh       71.99      (2.6%)       72.44      
(3.4%)    0.6% (  -5% -    6%) 0.520
                     LowSpanNear      191.64      (3.5%)      192.85      
(3.7%)    0.6% (  -6% -    8%) 0.580
            MedTermDayTaxoFacets       55.51      (3.1%)       55.86      
(3.9%)    0.6% (  -6% -    7%) 0.567
     BrowseRandomLabelTaxoFacets    11492.93      (5.0%)    11570.83      
(4.8%)    0.7% (  -8% -   11%) 0.663
                          IntNRQ       93.40      (2.1%)       94.05      
(2.4%)    0.7% (  -3% -    5%) 0.319
                      AndHighMed      175.02      (2.6%)      176.42      
(3.9%)    0.8% (  -5% -    7%) 0.445
                          Fuzzy2       45.25      (7.2%)       45.64      
(6.2%)    0.9% ( -11% -   15%) 0.682
                      AndHighLow      825.32      (6.8%)      833.43      
(8.0%)    1.0% ( -12% -   16%) 0.677
                         MedTerm     1408.91      (6.2%)     1423.27     
(10.2%)    1.0% ( -14% -   18%) 0.703
                       OrHighMed      136.68      (3.8%)      138.15      
(3.6%)    1.1% (  -6% -    8%) 0.356
                      OrHighHigh       16.31      (3.4%)       16.49      
(1.9%)    1.1% (  -4% -    6%) 0.205
       BrowseDayOfYearTaxoFacets    11349.30      (4.4%)    11494.17      
(4.6%)    1.3% (  -7% -   10%) 0.366
                      HighPhrase       83.13      (2.9%)       84.24      
(3.4%)    1.3% (  -4% -    7%) 0.184
                    OrHighNotMed      630.30      (5.6%)      639.65      
(6.4%)    1.5% (  -9% -   14%) 0.436
                       LowPhrase      310.17      (4.2%)      315.08      
(5.4%)    1.6% (  -7% -   11%) 0.297
                   OrHighNotHigh      723.22      (5.0%)      734.71      
(8.4%)    1.6% ( -11% -   15%) 0.468
           BrowseMonthTaxoFacets    11665.05      (7.6%)    11892.66      
(5.1%)    2.0% (  -9% -   15%) 0.339
                    OrHighNotLow      851.60      (6.5%)      869.16      
(7.6%)    2.1% ( -11% -   17%) 0.355
                    OrNotHighMed      699.29      (5.2%)      717.74      
(7.7%)    2.6% (  -9% -   16%) 0.205
                    OrNotHighLow      954.65      (6.4%)      982.93      
(9.6%)    3.0% ( -12% -   20%) 0.252
                         LowTerm     2158.23      (9.1%)     2227.33     
(13.4%)    3.2% ( -17% -   28%) 0.377

{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-10333) Speed up BinaryDocValues with a batch reading on LongValues

Reply via email to