[jira] [Updated] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

Feng Guo (Jira) Mon, 27 Dec 2021 09:12:14 -0800


     [ 
https://issues.apache.org/jira/browse/LUCENE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Feng Guo updated LUCENE-10334:
------------------------------
    Description: 
Previous talk is here: [https://github.com/apache/lucene/pull/557]

This is trying to add a new BlockReader based on ForUtil to replace the 
DirectReader we are using for NumericDocvalues

-*Benchmark based on wiki10m*- (Previous benchmark results are wrong so i 
deleted it to avoid misleading, let's see the benchmark in comments.)

  was:
Previous talk is here: https://github.com/apache/lucene/pull/557

This is trying to add a new BlockReader based on ForUtil to replace the 
DirectReader we are using for NumericDocvalues

*Benchmark based on wiki10m*

{code:java}
                            TaskQPS baseline      StdDevQPS my_modified_version 
     StdDev                Pct diff p-value
                   OrNotHighHigh      694.17      (8.2%)      685.83      
(7.0%)   -1.2% ( -15% -   15%) 0.618
                         Respell       75.15      (2.7%)       74.32      
(2.0%)   -1.1% (  -5% -    3%) 0.146
                         Prefix3      220.11      (5.1%)      217.78      
(5.8%)   -1.1% ( -11% -   10%) 0.541
                        Wildcard      129.75      (3.7%)      128.63      
(2.5%)   -0.9% (  -6% -    5%) 0.383
                     LowSpanNear       68.54      (2.1%)       68.00      
(2.4%)   -0.8% (  -5% -    3%) 0.269
                    OrNotHighMed      732.90      (6.8%)      727.49      
(5.3%)   -0.7% ( -12% -   12%) 0.703
     BrowseRandomLabelTaxoFacets    11879.03      (8.6%)    11799.33      
(5.5%)   -0.7% ( -13% -   14%) 0.769
                HighSloppyPhrase        6.87      (2.9%)        6.83      
(2.3%)   -0.6% (  -5% -    4%) 0.496
                    OrHighNotMed      827.54      (9.2%)      822.94      
(8.0%)   -0.6% ( -16% -   18%) 0.838
                     MedSpanNear       18.92      (5.7%)       18.82      
(5.6%)   -0.5% ( -11% -   11%) 0.759
          OrHighMedDayTaxoFacets       10.27      (4.0%)       10.21      
(4.3%)   -0.5% (  -8% -    8%) 0.676
                        PKLookup      207.98      (4.0%)      206.85      
(2.7%)   -0.5% (  -7% -    6%) 0.621
             LowIntervalsOrdered      159.17      (2.3%)      158.32      
(2.2%)   -0.5% (  -4% -    3%) 0.445
                    HighSpanNear        6.32      (4.2%)        6.28      
(4.1%)   -0.5% (  -8% -    8%) 0.691
             MedIntervalsOrdered       85.31      (3.2%)       84.88      
(2.9%)   -0.5% (  -6% -    5%) 0.607
                        HighTerm     1170.55      (5.8%)     1164.79      
(3.9%)   -0.5% (  -9% -    9%) 0.753
                 LowSloppyPhrase       14.54      (3.1%)       14.48      
(2.9%)   -0.4% (  -6% -    5%) 0.651
                      HighPhrase      112.81      (4.4%)      112.39      
(4.1%)   -0.4% (  -8% -    8%) 0.781
                    OrNotHighLow      858.02      (5.9%)      854.99      
(4.8%)   -0.4% ( -10% -   10%) 0.835
            HighIntervalsOrdered       25.08      (2.8%)       25.00      
(2.6%)   -0.3% (  -5% -    5%) 0.701
                       MedPhrase       27.20      (2.1%)       27.11      
(2.9%)   -0.3% (  -5% -    4%) 0.689
            MedTermDayTaxoFacets       81.55      (2.3%)       81.35      
(2.9%)   -0.3% (  -5% -    5%) 0.762
                          IntNRQ       63.36      (2.0%)       63.21      
(2.5%)   -0.2% (  -4% -    4%) 0.740
                          Fuzzy2       73.24      (5.5%)       73.10      
(6.2%)   -0.2% ( -11% -   12%) 0.916
         AndHighMedDayTaxoFacets       76.08      (3.5%)       75.98      
(3.4%)   -0.1% (  -6% -    7%) 0.905
                     AndHighHigh       62.20      (2.0%)       62.18      
(2.4%)   -0.0% (  -4% -    4%) 0.954
           BrowseMonthTaxoFacets    11993.48      (6.7%)    11989.53      
(4.8%)   -0.0% ( -10% -   12%) 0.986
                    OrHighNotLow      732.82      (7.2%)      732.80      
(6.2%)   -0.0% ( -12% -   14%) 0.999
                          Fuzzy1       46.43      (5.3%)       46.45      
(6.0%)    0.0% ( -10% -   11%) 0.989
                         LowTerm     1608.25      (6.0%)     1608.84      
(4.9%)    0.0% ( -10% -   11%) 0.983
                       OrHighMed       75.90      (2.3%)       75.93      
(1.8%)    0.0% (  -3% -    4%) 0.939
                       LowPhrase      273.81      (2.9%)      274.04      
(3.3%)    0.1% (  -5% -    6%) 0.932
                      AndHighLow      717.24      (6.1%)      718.17      
(3.3%)    0.1% (  -8% -   10%) 0.933
        AndHighHighDayTaxoFacets       39.63      (2.5%)       39.69      
(2.6%)    0.1% (  -4% -    5%) 0.862
                      OrHighHigh       34.63      (1.8%)       34.68      
(2.0%)    0.1% (  -3% -    4%) 0.821
                 MedSloppyPhrase      158.80      (2.8%)      159.09      
(2.6%)    0.2% (  -5% -    5%) 0.832
                       OrHighLow      257.77      (2.9%)      258.46      
(4.6%)    0.3% (  -7% -    8%) 0.826
                      AndHighMed      133.43      (2.1%)      133.79      
(2.7%)    0.3% (  -4% -    5%) 0.726
               HighTermMonthSort      145.28     (10.8%)      145.88     
(11.2%)    0.4% ( -19% -   25%) 0.905
                   OrHighNotHigh      834.99      (6.1%)      839.62      
(5.7%)    0.6% ( -10% -   13%) 0.766
                      TermDTSort       83.66      (9.6%)       84.30     
(11.1%)    0.8% ( -18% -   23%) 0.817
       BrowseDayOfYearTaxoFacets    11639.59      (5.1%)    11777.38      
(6.0%)    1.2% (  -9% -   12%) 0.502
                         MedTerm     1473.62      (7.4%)     1493.79      
(6.4%)    1.4% ( -11% -   16%) 0.530
            HighTermTitleBDVSort      114.98     (16.7%)      117.30     
(18.8%)    2.0% ( -28% -   45%) 0.720
           HighTermDayOfYearSort      128.29     (17.2%)      132.83     
(22.6%)    3.5% ( -30% -   52%) 0.577
            BrowseDateTaxoFacets       19.25     (20.4%)       26.77      
(3.7%)   39.1% (  12% -   79%) 0.000
     BrowseRandomLabelSSDVFacets       10.38      (3.5%)       18.03      
(6.8%)   73.7% (  61% -   87%) 0.000
           BrowseMonthSSDVFacets       15.71      (3.6%)       34.59     
(12.4%)  120.1% ( 100% -  141%) 0.000
       BrowseDayOfYearSSDVFacets       14.31      (3.3%)       33.54     
(12.9%)  134.4% ( 114% -  155%) 0.000
{code}

*candidate*
{code:java}
PERCENT       CPU SAMPLES   STACK
3.48%         9280          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
3.41%         9082          
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
3.32%         8836          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
2.72%         7260          
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
2.03%         5423          
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
1.91%         5094          
org.apache.lucene.queries.spans.TermSpans#nextStartPosition()
1.90%         5063          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.80%         4787          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.72%         4574          org.apache.lucene.search.PhraseScorer$1#matches()
1.55%         4141          org.apache.lucene.util.PriorityQueue#upHeap()
1.55%         4141          org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.53%         4073          
org.apache.lucene.queries.spans.NearSpansOrdered#nextStartPosition()
1.47%         3929          org.apache.lucene.util.packed.BlockReader#get()
1.39%         3703          org.apache.lucene.search.ConjunctionDISI#doNext()
1.35%         3593          
org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.32%         3514          
jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
1.21%         3236          
org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
1.13%         3003          
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#nextDoc()
1.05%         2808          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#nextPosition()
1.04%         2780          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
1.03%         2750          
org.apache.lucene.search.BooleanScorer$OrCollector#collect()
1.03%         2732          
org.apache.lucene.search.SloppyPhraseMatcher#maxFreq()
0.99%         2627          
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
0.98%         2610          
org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect()
0.89%         2368          
org.apache.lucene.search.SloppyPhraseMatcher#initSimple()
0.88%         2350          
org.apache.lucene.queries.spans.NearSpansOrdered#advancePosition()
0.87%         2312          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#advance()
0.84%         2252          org.apache.lucene.util.PriorityQueue#add()
0.82%         2176          
org.apache.lucene.queries.spans.SpanScorer#setFreqCurrentDoc()
0.81%         2161          org.apache.lucene.codecs.lucene90.PForUtil#decode()
{code}

*baseline*
{code:java}
PERCENT       CPU SAMPLES   STACK
4.22%         12298         
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
3.25%         9468          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
3.04%         8872          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
2.26%         6576          
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
2.06%         5993          
org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
1.90%         5528          
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
1.81%         5266          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.75%         5116          
org.apache.lucene.queries.spans.TermSpans#nextStartPosition()
1.53%         4469          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.51%         4392          org.apache.lucene.search.PhraseScorer$1#matches()
1.44%         4204          org.apache.lucene.util.PriorityQueue#upHeap()
1.37%         3999          
jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
1.37%         3992          org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.37%         3991          
org.apache.lucene.queries.spans.NearSpansOrdered#nextStartPosition()
1.33%         3869          org.apache.lucene.search.ConjunctionDISI#doNext()
1.27%         3688          
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#nextDoc()
1.24%         3606          java.nio.Buffer#scope()
1.23%         3593          
org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.20%         3491          
org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get()
1.16%         3392          java.nio.Buffer#checkIndex()
1.09%         3186          
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
1.09%         3164          
org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
1.01%         2946          
org.apache.lucene.store.ByteBufferGuard#ensureValid()
0.95%         2772          
org.apache.lucene.search.BooleanScorer$OrCollector#collect()
0.95%         2766          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
0.95%         2763          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#nextPosition()
0.93%         2699          
org.apache.lucene.search.SloppyPhraseMatcher#maxFreq()
0.92%         2678          
org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect()
0.87%         2545          
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
0.85%         2479          
org.apache.lucene.search.SloppyPhraseMatcher#initSimple()
{code}




> Introduce a BlockReader based on ForUtil and use it for NumericDocValues
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-10334
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10334
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Feng Guo
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Previous talk is here: [https://github.com/apache/lucene/pull/557]
> This is trying to add a new BlockReader based on ForUtil to replace the 
> DirectReader we are using for NumericDocvalues
> -*Benchmark based on wiki10m*- (Previous benchmark results are wrong so i 
> deleted it to avoid misleading, let's see the benchmark in comments.)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-10334) Introduce a BlockReader based on ForUtil and use it for NumericDocValues

Reply via email to