[
https://issues.apache.org/jira/browse/LUCENE-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Feng Guo updated LUCENE-10333:
------------------------------
Description:
*Description*
In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as
{{SortedNumericDocValues}} not in singleton case) has code patterns like this:
{code:java}
long startOffset = addresses.get(doc);
bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
{code}
This means we need to read 2 longs stored together. We could probably push down
this info to {{LongValues}} and read 2 values together in one call. I think
this can make sense because these codes could be rather hot.
*Benchmark*
In today's LuceneUtil benchmark, all results looks even. I suspect this is
because we do not use {{BinaryDocValues}} any more in tasks. So i tried to roll
back the baseline and candidate to a stale code version (before
https://issues.apache.org/jira/browse/LUCENE-10062), we used
{{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been
seen a QPS increasing there. (This is tricky, i wonder if we can have a more
official way to benchmark BinaryDocValues by chaging some params or add some
tasks?) Anyway, I believe It is still worth optimizing {{BinarayDocValue}}
though facets do not use it any more :)
*Benchmark result on stale code version where taxonomy ordinals are stored in
BinaryDocvalues (to justify a speed up in BinaryDocValues)*
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
BrowseMonthSSDVFacets 17.25 (8.6%) 16.78
(17.8%) -2.7% ( -26% - 25%) 0.536
LowTerm 1458.66 (3.6%) 1438.15
(4.4%) -1.4% ( -9% - 6%) 0.268
HighTermDayOfYearSort 108.55 (10.0%) 108.04
(9.1%) -0.5% ( -17% - 20%) 0.874
HighPhrase 168.65 (1.9%) 168.06
(2.3%) -0.3% ( -4% - 3%) 0.602
OrNotHighLow 1201.79 (3.4%) 1197.93
(4.6%) -0.3% ( -8% - 7%) 0.801
HighSpanNear 15.26 (1.6%) 15.21
(1.4%) -0.3% ( -3% - 2%) 0.499
Respell 62.61 (1.8%) 62.45
(1.9%) -0.3% ( -3% - 3%) 0.649
MedPhrase 57.57 (1.4%) 57.44
(1.8%) -0.2% ( -3% - 2%) 0.648
OrHighMed 129.10 (3.0%) 128.83
(3.1%) -0.2% ( -6% - 6%) 0.830
MedSpanNear 19.45 (2.3%) 19.41
(2.2%) -0.2% ( -4% - 4%) 0.784
OrHighHigh 34.85 (1.5%) 34.79
(1.4%) -0.2% ( -3% - 2%) 0.722
HighIntervalsOrdered 26.92 (4.7%) 26.89
(4.9%) -0.1% ( -9% - 9%) 0.929
IntNRQ 343.52 (1.6%) 343.16
(2.0%) -0.1% ( -3% - 3%) 0.855
OrHighNotHigh 595.61 (3.2%) 595.10
(4.3%) -0.1% ( -7% - 7%) 0.944
MedIntervalsOrdered 17.66 (3.6%) 17.65
(3.8%) -0.1% ( -7% - 7%) 0.961
LowIntervalsOrdered 109.23 (3.3%) 109.18
(3.5%) -0.0% ( -6% - 7%) 0.969
AndHighHigh 81.09 (1.5%) 81.10
(2.0%) 0.0% ( -3% - 3%) 0.967
LowSpanNear 203.33 (2.1%) 203.41
(1.8%) 0.0% ( -3% - 3%) 0.948
MedSloppyPhrase 27.15 (1.5%) 27.17
(1.2%) 0.1% ( -2% - 2%) 0.907
LowPhrase 75.76 (1.8%) 75.81
(2.0%) 0.1% ( -3% - 3%) 0.904
AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35
(1.9%) 0.1% ( -3% - 4%) 0.888
HighSloppyPhrase 14.32 (2.7%) 14.34
(1.8%) 0.1% ( -4% - 4%) 0.870
Fuzzy2 76.00 (3.9%) 76.12
(3.4%) 0.2% ( -6% - 7%) 0.894
Wildcard 123.51 (1.8%) 123.71
(2.1%) 0.2% ( -3% - 4%) 0.796
OrHighNotLow 722.64 (4.4%) 724.15
(5.4%) 0.2% ( -9% - 10%) 0.894
AndHighLow 929.73 (4.0%) 931.75
(3.8%) 0.2% ( -7% - 8%) 0.859
Prefix3 240.13 (1.5%) 240.69
(1.9%) 0.2% ( -3% - 3%) 0.675
AndHighMed 210.17 (1.7%) 210.84
(1.6%) 0.3% ( -2% - 3%) 0.532
LowSloppyPhrase 142.83 (1.8%) 143.54
(2.0%) 0.5% ( -3% - 4%) 0.410
OrNotHighMed 709.24 (4.4%) 712.78
(4.3%) 0.5% ( -7% - 9%) 0.715
Fuzzy1 85.33 (5.7%) 85.77
(6.3%) 0.5% ( -10% - 13%) 0.786
MedTerm 1466.50 (3.5%) 1474.85
(3.9%) 0.6% ( -6% - 8%) 0.629
TermDTSort 105.51 (7.7%) 106.33
(7.3%) 0.8% ( -13% - 17%) 0.746
PKLookup 206.18 (2.9%) 208.68
(2.9%) 1.2% ( -4% - 7%) 0.179
OrHighNotMed 876.71 (3.0%) 887.84
(3.9%) 1.3% ( -5% - 8%) 0.251
OrNotHighHigh 774.25 (4.7%) 785.03
(6.0%) 1.4% ( -8% - 12%) 0.411
HighTermMonthSort 74.33 (9.4%) 75.47
(16.3%) 1.5% ( -22% - 30%) 0.716
OrHighLow 518.73 (5.2%) 528.27
(5.4%) 1.8% ( -8% - 13%) 0.272
HighTerm 1892.16 (3.4%) 1934.63
(5.5%) 2.2% ( -6% - 11%) 0.120
AndHighHighDayTaxoFacets 16.46 (2.7%) 16.84
(2.3%) 2.3% ( -2% - 7%) 0.004
HighTermTitleBDVSort 141.39 (14.6%) 145.33
(15.1%) 2.8% ( -23% - 38%) 0.554
MedTermDayTaxoFacets 27.81 (2.1%) 29.54
(2.3%) 6.2% ( 1% - 10%) 0.000
OrHighMedDayTaxoFacets 3.05 (1.9%) 3.30
(2.2%) 8.3% ( 4% - 12%) 0.000
BrowseDayOfYearSSDVFacets 17.36 (13.0%) 18.97
(15.8%) 9.3% ( -17% - 43%) 0.042
BrowseDayOfYearTaxoFacets 3.02 (3.6%) 3.79
(2.5%) 25.4% ( 18% - 32%) 0.000
BrowseDateTaxoFacets 3.01 (3.6%) 3.79
(2.5%) 25.6% ( 18% - 32%) 0.000
BrowseMonthTaxoFacets 3.14 (2.1%) 3.99
(2.5%) 27.0% ( 21% - 32%) 0.000
{code}
*newest code version*
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
TermDTSort 129.74 (10.9%) 127.83
(11.3%) -1.5% ( -21% - 23%) 0.675
HighTerm 1182.13 (5.1%) 1172.76
(6.5%) -0.8% ( -11% - 11%) 0.668
HighSpanNear 7.99 (4.2%) 7.96
(4.2%) -0.3% ( -8% - 8%) 0.816
HighIntervalsOrdered 17.86 (2.1%) 17.85
(2.3%) -0.1% ( -4% - 4%) 0.927
BrowseDateTaxoFacets 19.61 (17.2%) 19.61
(17.4%) -0.0% ( -29% - 41%) 0.995
OrNotHighHigh 619.85 (4.3%) 619.72
(8.6%) -0.0% ( -12% - 13%) 0.992
PKLookup 202.14 (5.6%) 202.11
(4.4%) -0.0% ( -9% - 10%) 0.994
LowIntervalsOrdered 25.53 (1.5%) 25.53
(1.6%) 0.0% ( -3% - 3%) 1.000
BrowseDayOfYearSSDVFacets 14.27 (2.7%) 14.28
(2.7%) 0.0% ( -5% - 5%) 0.965
MedIntervalsOrdered 47.33 (1.9%) 47.34
(2.0%) 0.0% ( -3% - 3%) 0.947
BrowseRandomLabelSSDVFacets 10.25 (2.4%) 10.26
(2.4%) 0.1% ( -4% - 4%) 0.935
BrowseMonthSSDVFacets 15.66 (3.0%) 15.67
(3.0%) 0.1% ( -5% - 6%) 0.945
MedSloppyPhrase 11.97 (1.7%) 11.98
(1.9%) 0.1% ( -3% - 3%) 0.840
Wildcard 25.71 (2.6%) 25.75
(2.4%) 0.1% ( -4% - 5%) 0.875
MedPhrase 33.62 (2.5%) 33.68
(2.6%) 0.2% ( -4% - 5%) 0.802
HighTermDayOfYearSort 80.58 (11.0%) 80.76
(10.6%) 0.2% ( -19% - 24%) 0.949
HighTermTitleBDVSort 130.43 (11.7%) 130.73
(10.7%) 0.2% ( -19% - 25%) 0.947
AndHighHighDayTaxoFacets 32.25 (3.0%) 32.33
(2.9%) 0.2% ( -5% - 6%) 0.796
LowSloppyPhrase 39.50 (1.7%) 39.61
(1.4%) 0.3% ( -2% - 3%) 0.586
Prefix3 127.42 (3.8%) 127.77
(3.4%) 0.3% ( -6% - 7%) 0.812
HighTermMonthSort 117.65 (8.4%) 117.98
(8.1%) 0.3% ( -14% - 18%) 0.915
HighSloppyPhrase 14.47 (1.8%) 14.51
(2.2%) 0.3% ( -3% - 4%) 0.647
MedSpanNear 48.78 (2.2%) 48.93
(2.0%) 0.3% ( -3% - 4%) 0.640
OrHighMedDayTaxoFacets 13.42 (3.7%) 13.48
(3.6%) 0.4% ( -6% - 7%) 0.730
AndHighMedDayTaxoFacets 37.90 (3.0%) 38.05
(3.4%) 0.4% ( -5% - 7%) 0.694
Fuzzy1 83.31 (3.9%) 83.70
(4.9%) 0.5% ( -7% - 9%) 0.738
Respell 49.74 (1.3%) 50.00
(1.5%) 0.5% ( -2% - 3%) 0.254
OrHighLow 531.57 (8.0%) 534.83
(6.7%) 0.6% ( -13% - 16%) 0.792
AndHighHigh 71.99 (2.6%) 72.44
(3.4%) 0.6% ( -5% - 6%) 0.520
LowSpanNear 191.64 (3.5%) 192.85
(3.7%) 0.6% ( -6% - 8%) 0.580
MedTermDayTaxoFacets 55.51 (3.1%) 55.86
(3.9%) 0.6% ( -6% - 7%) 0.567
BrowseRandomLabelTaxoFacets 11492.93 (5.0%) 11570.83
(4.8%) 0.7% ( -8% - 11%) 0.663
IntNRQ 93.40 (2.1%) 94.05
(2.4%) 0.7% ( -3% - 5%) 0.319
AndHighMed 175.02 (2.6%) 176.42
(3.9%) 0.8% ( -5% - 7%) 0.445
Fuzzy2 45.25 (7.2%) 45.64
(6.2%) 0.9% ( -11% - 15%) 0.682
AndHighLow 825.32 (6.8%) 833.43
(8.0%) 1.0% ( -12% - 16%) 0.677
MedTerm 1408.91 (6.2%) 1423.27
(10.2%) 1.0% ( -14% - 18%) 0.703
OrHighMed 136.68 (3.8%) 138.15
(3.6%) 1.1% ( -6% - 8%) 0.356
OrHighHigh 16.31 (3.4%) 16.49
(1.9%) 1.1% ( -4% - 6%) 0.205
BrowseDayOfYearTaxoFacets 11349.30 (4.4%) 11494.17
(4.6%) 1.3% ( -7% - 10%) 0.366
HighPhrase 83.13 (2.9%) 84.24
(3.4%) 1.3% ( -4% - 7%) 0.184
OrHighNotMed 630.30 (5.6%) 639.65
(6.4%) 1.5% ( -9% - 14%) 0.436
LowPhrase 310.17 (4.2%) 315.08
(5.4%) 1.6% ( -7% - 11%) 0.297
OrHighNotHigh 723.22 (5.0%) 734.71
(8.4%) 1.6% ( -11% - 15%) 0.468
BrowseMonthTaxoFacets 11665.05 (7.6%) 11892.66
(5.1%) 2.0% ( -9% - 15%) 0.339
OrHighNotLow 851.60 (6.5%) 869.16
(7.6%) 2.1% ( -11% - 17%) 0.355
OrNotHighMed 699.29 (5.2%) 717.74
(7.7%) 2.6% ( -9% - 16%) 0.205
OrNotHighLow 954.65 (6.4%) 982.93
(9.6%) 3.0% ( -12% - 20%) 0.252
LowTerm 2158.23 (9.1%) 2227.33
(13.4%) 3.2% ( -17% - 28%) 0.377
{code}
was:
*Description*
In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as
{{SortedNumericDocValues}} not in singleton case) has code patterns like this:
{code:java}
long startOffset = addresses.get(doc);
bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
{code}
This means we need to read 2 longs stored together. We could probably push down
this info to {{LongValues}} and read 2 values together in one call. I think
this can make sense because these codes could be rather hot.
*Benchmark*
In today's LuceneUtil benchmark, all results looks even. I suspect this is
because we do not use {{BinaryDocValues}} any more in tasks. So i tried to roll
back the baseline and candidate to a stale code version (before
https://issues.apache.org/jira/browse/LUCENE-10062), we used
{{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can been
seen a QPS increasing there. (This is tricky, i wonder if we can have a more
official way to benchmark BinaryDocValues by chaging some params or add some
tasks?) Anyway, I believe It is still worth optimizing {{BinarayDocValue}}
though facets do not use it any more :)
*Benchmark result on stale code version where taxonomy ordinals are stored in
BinaryDocvalues (to justivy a speed up in BinaryDocValues)*
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
BrowseMonthSSDVFacets 17.25 (8.6%) 16.78
(17.8%) -2.7% ( -26% - 25%) 0.536
LowTerm 1458.66 (3.6%) 1438.15
(4.4%) -1.4% ( -9% - 6%) 0.268
HighTermDayOfYearSort 108.55 (10.0%) 108.04
(9.1%) -0.5% ( -17% - 20%) 0.874
HighPhrase 168.65 (1.9%) 168.06
(2.3%) -0.3% ( -4% - 3%) 0.602
OrNotHighLow 1201.79 (3.4%) 1197.93
(4.6%) -0.3% ( -8% - 7%) 0.801
HighSpanNear 15.26 (1.6%) 15.21
(1.4%) -0.3% ( -3% - 2%) 0.499
Respell 62.61 (1.8%) 62.45
(1.9%) -0.3% ( -3% - 3%) 0.649
MedPhrase 57.57 (1.4%) 57.44
(1.8%) -0.2% ( -3% - 2%) 0.648
OrHighMed 129.10 (3.0%) 128.83
(3.1%) -0.2% ( -6% - 6%) 0.830
MedSpanNear 19.45 (2.3%) 19.41
(2.2%) -0.2% ( -4% - 4%) 0.784
OrHighHigh 34.85 (1.5%) 34.79
(1.4%) -0.2% ( -3% - 2%) 0.722
HighIntervalsOrdered 26.92 (4.7%) 26.89
(4.9%) -0.1% ( -9% - 9%) 0.929
IntNRQ 343.52 (1.6%) 343.16
(2.0%) -0.1% ( -3% - 3%) 0.855
OrHighNotHigh 595.61 (3.2%) 595.10
(4.3%) -0.1% ( -7% - 7%) 0.944
MedIntervalsOrdered 17.66 (3.6%) 17.65
(3.8%) -0.1% ( -7% - 7%) 0.961
LowIntervalsOrdered 109.23 (3.3%) 109.18
(3.5%) -0.0% ( -6% - 7%) 0.969
AndHighHigh 81.09 (1.5%) 81.10
(2.0%) 0.0% ( -3% - 3%) 0.967
LowSpanNear 203.33 (2.1%) 203.41
(1.8%) 0.0% ( -3% - 3%) 0.948
MedSloppyPhrase 27.15 (1.5%) 27.17
(1.2%) 0.1% ( -2% - 2%) 0.907
LowPhrase 75.76 (1.8%) 75.81
(2.0%) 0.1% ( -3% - 3%) 0.904
AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35
(1.9%) 0.1% ( -3% - 4%) 0.888
HighSloppyPhrase 14.32 (2.7%) 14.34
(1.8%) 0.1% ( -4% - 4%) 0.870
Fuzzy2 76.00 (3.9%) 76.12
(3.4%) 0.2% ( -6% - 7%) 0.894
Wildcard 123.51 (1.8%) 123.71
(2.1%) 0.2% ( -3% - 4%) 0.796
OrHighNotLow 722.64 (4.4%) 724.15
(5.4%) 0.2% ( -9% - 10%) 0.894
AndHighLow 929.73 (4.0%) 931.75
(3.8%) 0.2% ( -7% - 8%) 0.859
Prefix3 240.13 (1.5%) 240.69
(1.9%) 0.2% ( -3% - 3%) 0.675
AndHighMed 210.17 (1.7%) 210.84
(1.6%) 0.3% ( -2% - 3%) 0.532
LowSloppyPhrase 142.83 (1.8%) 143.54
(2.0%) 0.5% ( -3% - 4%) 0.410
OrNotHighMed 709.24 (4.4%) 712.78
(4.3%) 0.5% ( -7% - 9%) 0.715
Fuzzy1 85.33 (5.7%) 85.77
(6.3%) 0.5% ( -10% - 13%) 0.786
MedTerm 1466.50 (3.5%) 1474.85
(3.9%) 0.6% ( -6% - 8%) 0.629
TermDTSort 105.51 (7.7%) 106.33
(7.3%) 0.8% ( -13% - 17%) 0.746
PKLookup 206.18 (2.9%) 208.68
(2.9%) 1.2% ( -4% - 7%) 0.179
OrHighNotMed 876.71 (3.0%) 887.84
(3.9%) 1.3% ( -5% - 8%) 0.251
OrNotHighHigh 774.25 (4.7%) 785.03
(6.0%) 1.4% ( -8% - 12%) 0.411
HighTermMonthSort 74.33 (9.4%) 75.47
(16.3%) 1.5% ( -22% - 30%) 0.716
OrHighLow 518.73 (5.2%) 528.27
(5.4%) 1.8% ( -8% - 13%) 0.272
HighTerm 1892.16 (3.4%) 1934.63
(5.5%) 2.2% ( -6% - 11%) 0.120
AndHighHighDayTaxoFacets 16.46 (2.7%) 16.84
(2.3%) 2.3% ( -2% - 7%) 0.004
HighTermTitleBDVSort 141.39 (14.6%) 145.33
(15.1%) 2.8% ( -23% - 38%) 0.554
MedTermDayTaxoFacets 27.81 (2.1%) 29.54
(2.3%) 6.2% ( 1% - 10%) 0.000
OrHighMedDayTaxoFacets 3.05 (1.9%) 3.30
(2.2%) 8.3% ( 4% - 12%) 0.000
BrowseDayOfYearSSDVFacets 17.36 (13.0%) 18.97
(15.8%) 9.3% ( -17% - 43%) 0.042
BrowseDayOfYearTaxoFacets 3.02 (3.6%) 3.79
(2.5%) 25.4% ( 18% - 32%) 0.000
BrowseDateTaxoFacets 3.01 (3.6%) 3.79
(2.5%) 25.6% ( 18% - 32%) 0.000
BrowseMonthTaxoFacets 3.14 (2.1%) 3.99
(2.5%) 27.0% ( 21% - 32%) 0.000
{code}
*newest code version*
{code:java}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
TermDTSort 129.74 (10.9%) 127.83
(11.3%) -1.5% ( -21% - 23%) 0.675
HighTerm 1182.13 (5.1%) 1172.76
(6.5%) -0.8% ( -11% - 11%) 0.668
HighSpanNear 7.99 (4.2%) 7.96
(4.2%) -0.3% ( -8% - 8%) 0.816
HighIntervalsOrdered 17.86 (2.1%) 17.85
(2.3%) -0.1% ( -4% - 4%) 0.927
BrowseDateTaxoFacets 19.61 (17.2%) 19.61
(17.4%) -0.0% ( -29% - 41%) 0.995
OrNotHighHigh 619.85 (4.3%) 619.72
(8.6%) -0.0% ( -12% - 13%) 0.992
PKLookup 202.14 (5.6%) 202.11
(4.4%) -0.0% ( -9% - 10%) 0.994
LowIntervalsOrdered 25.53 (1.5%) 25.53
(1.6%) 0.0% ( -3% - 3%) 1.000
BrowseDayOfYearSSDVFacets 14.27 (2.7%) 14.28
(2.7%) 0.0% ( -5% - 5%) 0.965
MedIntervalsOrdered 47.33 (1.9%) 47.34
(2.0%) 0.0% ( -3% - 3%) 0.947
BrowseRandomLabelSSDVFacets 10.25 (2.4%) 10.26
(2.4%) 0.1% ( -4% - 4%) 0.935
BrowseMonthSSDVFacets 15.66 (3.0%) 15.67
(3.0%) 0.1% ( -5% - 6%) 0.945
MedSloppyPhrase 11.97 (1.7%) 11.98
(1.9%) 0.1% ( -3% - 3%) 0.840
Wildcard 25.71 (2.6%) 25.75
(2.4%) 0.1% ( -4% - 5%) 0.875
MedPhrase 33.62 (2.5%) 33.68
(2.6%) 0.2% ( -4% - 5%) 0.802
HighTermDayOfYearSort 80.58 (11.0%) 80.76
(10.6%) 0.2% ( -19% - 24%) 0.949
HighTermTitleBDVSort 130.43 (11.7%) 130.73
(10.7%) 0.2% ( -19% - 25%) 0.947
AndHighHighDayTaxoFacets 32.25 (3.0%) 32.33
(2.9%) 0.2% ( -5% - 6%) 0.796
LowSloppyPhrase 39.50 (1.7%) 39.61
(1.4%) 0.3% ( -2% - 3%) 0.586
Prefix3 127.42 (3.8%) 127.77
(3.4%) 0.3% ( -6% - 7%) 0.812
HighTermMonthSort 117.65 (8.4%) 117.98
(8.1%) 0.3% ( -14% - 18%) 0.915
HighSloppyPhrase 14.47 (1.8%) 14.51
(2.2%) 0.3% ( -3% - 4%) 0.647
MedSpanNear 48.78 (2.2%) 48.93
(2.0%) 0.3% ( -3% - 4%) 0.640
OrHighMedDayTaxoFacets 13.42 (3.7%) 13.48
(3.6%) 0.4% ( -6% - 7%) 0.730
AndHighMedDayTaxoFacets 37.90 (3.0%) 38.05
(3.4%) 0.4% ( -5% - 7%) 0.694
Fuzzy1 83.31 (3.9%) 83.70
(4.9%) 0.5% ( -7% - 9%) 0.738
Respell 49.74 (1.3%) 50.00
(1.5%) 0.5% ( -2% - 3%) 0.254
OrHighLow 531.57 (8.0%) 534.83
(6.7%) 0.6% ( -13% - 16%) 0.792
AndHighHigh 71.99 (2.6%) 72.44
(3.4%) 0.6% ( -5% - 6%) 0.520
LowSpanNear 191.64 (3.5%) 192.85
(3.7%) 0.6% ( -6% - 8%) 0.580
MedTermDayTaxoFacets 55.51 (3.1%) 55.86
(3.9%) 0.6% ( -6% - 7%) 0.567
BrowseRandomLabelTaxoFacets 11492.93 (5.0%) 11570.83
(4.8%) 0.7% ( -8% - 11%) 0.663
IntNRQ 93.40 (2.1%) 94.05
(2.4%) 0.7% ( -3% - 5%) 0.319
AndHighMed 175.02 (2.6%) 176.42
(3.9%) 0.8% ( -5% - 7%) 0.445
Fuzzy2 45.25 (7.2%) 45.64
(6.2%) 0.9% ( -11% - 15%) 0.682
AndHighLow 825.32 (6.8%) 833.43
(8.0%) 1.0% ( -12% - 16%) 0.677
MedTerm 1408.91 (6.2%) 1423.27
(10.2%) 1.0% ( -14% - 18%) 0.703
OrHighMed 136.68 (3.8%) 138.15
(3.6%) 1.1% ( -6% - 8%) 0.356
OrHighHigh 16.31 (3.4%) 16.49
(1.9%) 1.1% ( -4% - 6%) 0.205
BrowseDayOfYearTaxoFacets 11349.30 (4.4%) 11494.17
(4.6%) 1.3% ( -7% - 10%) 0.366
HighPhrase 83.13 (2.9%) 84.24
(3.4%) 1.3% ( -4% - 7%) 0.184
OrHighNotMed 630.30 (5.6%) 639.65
(6.4%) 1.5% ( -9% - 14%) 0.436
LowPhrase 310.17 (4.2%) 315.08
(5.4%) 1.6% ( -7% - 11%) 0.297
OrHighNotHigh 723.22 (5.0%) 734.71
(8.4%) 1.6% ( -11% - 15%) 0.468
BrowseMonthTaxoFacets 11665.05 (7.6%) 11892.66
(5.1%) 2.0% ( -9% - 15%) 0.339
OrHighNotLow 851.60 (6.5%) 869.16
(7.6%) 2.1% ( -11% - 17%) 0.355
OrNotHighMed 699.29 (5.2%) 717.74
(7.7%) 2.6% ( -9% - 16%) 0.205
OrNotHighLow 954.65 (6.4%) 982.93
(9.6%) 3.0% ( -12% - 20%) 0.252
LowTerm 2158.23 (9.1%) 2227.33
(13.4%) 3.2% ( -17% - 28%) 0.377
{code}
> Speed up BinaryDocValues with a batch reading on LongValues
> -----------------------------------------------------------
>
> Key: LUCENE-10333
> URL: https://issues.apache.org/jira/browse/LUCENE-10333
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Reporter: Feng Guo
> Priority: Minor
>
> *Description*
> In {{{}Lucene90DocValuesProducer{}}}, {{BinaryDocValue}} (as well as
> {{SortedNumericDocValues}} not in singleton case) has code patterns like this:
> {code:java}
> long startOffset = addresses.get(doc);
> bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
> {code}
> This means we need to read 2 longs stored together. We could probably push
> down this info to {{LongValues}} and read 2 values together in one call. I
> think this can make sense because these codes could be rather hot.
> *Benchmark*
> In today's LuceneUtil benchmark, all results looks even. I suspect this is
> because we do not use {{BinaryDocValues}} any more in tasks. So i tried to
> roll back the baseline and candidate to a stale code version (before
> https://issues.apache.org/jira/browse/LUCENE-10062), we used
> {{BinaryDocvalues}} to store taxonomy ordinals in that version, and it can
> been seen a QPS increasing there. (This is tricky, i wonder if we can have a
> more official way to benchmark BinaryDocValues by chaging some params or add
> some tasks?) Anyway, I believe It is still worth optimizing
> {{BinarayDocValue}} though facets do not use it any more :)
> *Benchmark result on stale code version where taxonomy ordinals are stored in
> BinaryDocvalues (to justify a speed up in BinaryDocValues)*
> {code:java}
> TaskQPS baseline StdDevQPS
> my_modified_version StdDev Pct diff p-value
> BrowseMonthSSDVFacets 17.25 (8.6%) 16.78
> (17.8%) -2.7% ( -26% - 25%) 0.536
> LowTerm 1458.66 (3.6%) 1438.15
> (4.4%) -1.4% ( -9% - 6%) 0.268
> HighTermDayOfYearSort 108.55 (10.0%) 108.04
> (9.1%) -0.5% ( -17% - 20%) 0.874
> HighPhrase 168.65 (1.9%) 168.06
> (2.3%) -0.3% ( -4% - 3%) 0.602
> OrNotHighLow 1201.79 (3.4%) 1197.93
> (4.6%) -0.3% ( -8% - 7%) 0.801
> HighSpanNear 15.26 (1.6%) 15.21
> (1.4%) -0.3% ( -3% - 2%) 0.499
> Respell 62.61 (1.8%) 62.45
> (1.9%) -0.3% ( -3% - 3%) 0.649
> MedPhrase 57.57 (1.4%) 57.44
> (1.8%) -0.2% ( -3% - 2%) 0.648
> OrHighMed 129.10 (3.0%) 128.83
> (3.1%) -0.2% ( -6% - 6%) 0.830
> MedSpanNear 19.45 (2.3%) 19.41
> (2.2%) -0.2% ( -4% - 4%) 0.784
> OrHighHigh 34.85 (1.5%) 34.79
> (1.4%) -0.2% ( -3% - 2%) 0.722
> HighIntervalsOrdered 26.92 (4.7%) 26.89
> (4.9%) -0.1% ( -9% - 9%) 0.929
> IntNRQ 343.52 (1.6%) 343.16
> (2.0%) -0.1% ( -3% - 3%) 0.855
> OrHighNotHigh 595.61 (3.2%) 595.10
> (4.3%) -0.1% ( -7% - 7%) 0.944
> MedIntervalsOrdered 17.66 (3.6%) 17.65
> (3.8%) -0.1% ( -7% - 7%) 0.961
> LowIntervalsOrdered 109.23 (3.3%) 109.18
> (3.5%) -0.0% ( -6% - 7%) 0.969
> AndHighHigh 81.09 (1.5%) 81.10
> (2.0%) 0.0% ( -3% - 3%) 0.967
> LowSpanNear 203.33 (2.1%) 203.41
> (1.8%) 0.0% ( -3% - 3%) 0.948
> MedSloppyPhrase 27.15 (1.5%) 27.17
> (1.2%) 0.1% ( -2% - 2%) 0.907
> LowPhrase 75.76 (1.8%) 75.81
> (2.0%) 0.1% ( -3% - 3%) 0.904
> AndHighMedDayTaxoFacets 97.27 (1.9%) 97.35
> (1.9%) 0.1% ( -3% - 4%) 0.888
> HighSloppyPhrase 14.32 (2.7%) 14.34
> (1.8%) 0.1% ( -4% - 4%) 0.870
> Fuzzy2 76.00 (3.9%) 76.12
> (3.4%) 0.2% ( -6% - 7%) 0.894
> Wildcard 123.51 (1.8%) 123.71
> (2.1%) 0.2% ( -3% - 4%) 0.796
> OrHighNotLow 722.64 (4.4%) 724.15
> (5.4%) 0.2% ( -9% - 10%) 0.894
> AndHighLow 929.73 (4.0%) 931.75
> (3.8%) 0.2% ( -7% - 8%) 0.859
> Prefix3 240.13 (1.5%) 240.69
> (1.9%) 0.2% ( -3% - 3%) 0.675
> AndHighMed 210.17 (1.7%) 210.84
> (1.6%) 0.3% ( -2% - 3%) 0.532
> LowSloppyPhrase 142.83 (1.8%) 143.54
> (2.0%) 0.5% ( -3% - 4%) 0.410
> OrNotHighMed 709.24 (4.4%) 712.78
> (4.3%) 0.5% ( -7% - 9%) 0.715
> Fuzzy1 85.33 (5.7%) 85.77
> (6.3%) 0.5% ( -10% - 13%) 0.786
> MedTerm 1466.50 (3.5%) 1474.85
> (3.9%) 0.6% ( -6% - 8%) 0.629
> TermDTSort 105.51 (7.7%) 106.33
> (7.3%) 0.8% ( -13% - 17%) 0.746
> PKLookup 206.18 (2.9%) 208.68
> (2.9%) 1.2% ( -4% - 7%) 0.179
> OrHighNotMed 876.71 (3.0%) 887.84
> (3.9%) 1.3% ( -5% - 8%) 0.251
> OrNotHighHigh 774.25 (4.7%) 785.03
> (6.0%) 1.4% ( -8% - 12%) 0.411
> HighTermMonthSort 74.33 (9.4%) 75.47
> (16.3%) 1.5% ( -22% - 30%) 0.716
> OrHighLow 518.73 (5.2%) 528.27
> (5.4%) 1.8% ( -8% - 13%) 0.272
> HighTerm 1892.16 (3.4%) 1934.63
> (5.5%) 2.2% ( -6% - 11%) 0.120
> AndHighHighDayTaxoFacets 16.46 (2.7%) 16.84
> (2.3%) 2.3% ( -2% - 7%) 0.004
> HighTermTitleBDVSort 141.39 (14.6%) 145.33
> (15.1%) 2.8% ( -23% - 38%) 0.554
> MedTermDayTaxoFacets 27.81 (2.1%) 29.54
> (2.3%) 6.2% ( 1% - 10%) 0.000
> OrHighMedDayTaxoFacets 3.05 (1.9%) 3.30
> (2.2%) 8.3% ( 4% - 12%) 0.000
> BrowseDayOfYearSSDVFacets 17.36 (13.0%) 18.97
> (15.8%) 9.3% ( -17% - 43%) 0.042
> BrowseDayOfYearTaxoFacets 3.02 (3.6%) 3.79
> (2.5%) 25.4% ( 18% - 32%) 0.000
> BrowseDateTaxoFacets 3.01 (3.6%) 3.79
> (2.5%) 25.6% ( 18% - 32%) 0.000
> BrowseMonthTaxoFacets 3.14 (2.1%) 3.99
> (2.5%) 27.0% ( 21% - 32%) 0.000
> {code}
> *newest code version*
> {code:java}
> TaskQPS baseline StdDevQPS
> my_modified_version StdDev Pct diff p-value
> TermDTSort 129.74 (10.9%) 127.83
> (11.3%) -1.5% ( -21% - 23%) 0.675
> HighTerm 1182.13 (5.1%) 1172.76
> (6.5%) -0.8% ( -11% - 11%) 0.668
> HighSpanNear 7.99 (4.2%) 7.96
> (4.2%) -0.3% ( -8% - 8%) 0.816
> HighIntervalsOrdered 17.86 (2.1%) 17.85
> (2.3%) -0.1% ( -4% - 4%) 0.927
> BrowseDateTaxoFacets 19.61 (17.2%) 19.61
> (17.4%) -0.0% ( -29% - 41%) 0.995
> OrNotHighHigh 619.85 (4.3%) 619.72
> (8.6%) -0.0% ( -12% - 13%) 0.992
> PKLookup 202.14 (5.6%) 202.11
> (4.4%) -0.0% ( -9% - 10%) 0.994
> LowIntervalsOrdered 25.53 (1.5%) 25.53
> (1.6%) 0.0% ( -3% - 3%) 1.000
> BrowseDayOfYearSSDVFacets 14.27 (2.7%) 14.28
> (2.7%) 0.0% ( -5% - 5%) 0.965
> MedIntervalsOrdered 47.33 (1.9%) 47.34
> (2.0%) 0.0% ( -3% - 3%) 0.947
> BrowseRandomLabelSSDVFacets 10.25 (2.4%) 10.26
> (2.4%) 0.1% ( -4% - 4%) 0.935
> BrowseMonthSSDVFacets 15.66 (3.0%) 15.67
> (3.0%) 0.1% ( -5% - 6%) 0.945
> MedSloppyPhrase 11.97 (1.7%) 11.98
> (1.9%) 0.1% ( -3% - 3%) 0.840
> Wildcard 25.71 (2.6%) 25.75
> (2.4%) 0.1% ( -4% - 5%) 0.875
> MedPhrase 33.62 (2.5%) 33.68
> (2.6%) 0.2% ( -4% - 5%) 0.802
> HighTermDayOfYearSort 80.58 (11.0%) 80.76
> (10.6%) 0.2% ( -19% - 24%) 0.949
> HighTermTitleBDVSort 130.43 (11.7%) 130.73
> (10.7%) 0.2% ( -19% - 25%) 0.947
> AndHighHighDayTaxoFacets 32.25 (3.0%) 32.33
> (2.9%) 0.2% ( -5% - 6%) 0.796
> LowSloppyPhrase 39.50 (1.7%) 39.61
> (1.4%) 0.3% ( -2% - 3%) 0.586
> Prefix3 127.42 (3.8%) 127.77
> (3.4%) 0.3% ( -6% - 7%) 0.812
> HighTermMonthSort 117.65 (8.4%) 117.98
> (8.1%) 0.3% ( -14% - 18%) 0.915
> HighSloppyPhrase 14.47 (1.8%) 14.51
> (2.2%) 0.3% ( -3% - 4%) 0.647
> MedSpanNear 48.78 (2.2%) 48.93
> (2.0%) 0.3% ( -3% - 4%) 0.640
> OrHighMedDayTaxoFacets 13.42 (3.7%) 13.48
> (3.6%) 0.4% ( -6% - 7%) 0.730
> AndHighMedDayTaxoFacets 37.90 (3.0%) 38.05
> (3.4%) 0.4% ( -5% - 7%) 0.694
> Fuzzy1 83.31 (3.9%) 83.70
> (4.9%) 0.5% ( -7% - 9%) 0.738
> Respell 49.74 (1.3%) 50.00
> (1.5%) 0.5% ( -2% - 3%) 0.254
> OrHighLow 531.57 (8.0%) 534.83
> (6.7%) 0.6% ( -13% - 16%) 0.792
> AndHighHigh 71.99 (2.6%) 72.44
> (3.4%) 0.6% ( -5% - 6%) 0.520
> LowSpanNear 191.64 (3.5%) 192.85
> (3.7%) 0.6% ( -6% - 8%) 0.580
> MedTermDayTaxoFacets 55.51 (3.1%) 55.86
> (3.9%) 0.6% ( -6% - 7%) 0.567
> BrowseRandomLabelTaxoFacets 11492.93 (5.0%) 11570.83
> (4.8%) 0.7% ( -8% - 11%) 0.663
> IntNRQ 93.40 (2.1%) 94.05
> (2.4%) 0.7% ( -3% - 5%) 0.319
> AndHighMed 175.02 (2.6%) 176.42
> (3.9%) 0.8% ( -5% - 7%) 0.445
> Fuzzy2 45.25 (7.2%) 45.64
> (6.2%) 0.9% ( -11% - 15%) 0.682
> AndHighLow 825.32 (6.8%) 833.43
> (8.0%) 1.0% ( -12% - 16%) 0.677
> MedTerm 1408.91 (6.2%) 1423.27
> (10.2%) 1.0% ( -14% - 18%) 0.703
> OrHighMed 136.68 (3.8%) 138.15
> (3.6%) 1.1% ( -6% - 8%) 0.356
> OrHighHigh 16.31 (3.4%) 16.49
> (1.9%) 1.1% ( -4% - 6%) 0.205
> BrowseDayOfYearTaxoFacets 11349.30 (4.4%) 11494.17
> (4.6%) 1.3% ( -7% - 10%) 0.366
> HighPhrase 83.13 (2.9%) 84.24
> (3.4%) 1.3% ( -4% - 7%) 0.184
> OrHighNotMed 630.30 (5.6%) 639.65
> (6.4%) 1.5% ( -9% - 14%) 0.436
> LowPhrase 310.17 (4.2%) 315.08
> (5.4%) 1.6% ( -7% - 11%) 0.297
> OrHighNotHigh 723.22 (5.0%) 734.71
> (8.4%) 1.6% ( -11% - 15%) 0.468
> BrowseMonthTaxoFacets 11665.05 (7.6%) 11892.66
> (5.1%) 2.0% ( -9% - 15%) 0.339
> OrHighNotLow 851.60 (6.5%) 869.16
> (7.6%) 2.1% ( -11% - 17%) 0.355
> OrNotHighMed 699.29 (5.2%) 717.74
> (7.7%) 2.6% ( -9% - 16%) 0.205
> OrNotHighLow 954.65 (6.4%) 982.93
> (9.6%) 3.0% ( -12% - 20%) 0.252
> LowTerm 2158.23 (9.1%) 2227.33
> (13.4%) 3.2% ( -17% - 28%) 0.377
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]