mikemccand commented on pull request #518:
URL: https://github.com/apache/lucene/pull/518#issuecomment-999027826


   OK, thank you @uschindler and @rmuir for helping me debug the tricky setup!  
I ran this `perf.py` using `luceneutil`:
   
   ```
   import sys
   sys.path.insert(0, '/l/util/src/python')
   
   import competition
   
   if __name__ == '__main__':
     sourceData = competition.sourceData()
     comp = competition.Competition()
   
     checkout = 'trunk'
     checkoutNewMMap = 'trunk-new-mmap'
   
     index = comp.newIndex(checkout, sourceData, numThreads=12, 
addDVFields=True, verbose=True,
                           grouping=False, useCMS=True,
                           javaCommand='/opt/jdk-18-ea-28/bin/java 
--add-modules jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC 
-Djava.io.tmpdir=/l/tmp',
                           analyzer = 'StandardAnalyzerNoStopWords',
                           facets = (('taxonomy:Date', 'Date'),
                                     ('taxonomy:Month', 'Month'),
                                     ('taxonomy:DayOfYear', 'DayOfYear'),
                                     ('taxonomy:RandomLabel.taxonomy', 
'RandomLabel'),
                                     ('sortedset:Month', 'Month'),
                                     ('sortedset:DayOfYear', 'DayOfYear'),
                                     ('sortedset:RandomLabel.sortedset', 
'RandomLabel')))
     comp.competitor('base', checkout, index=index,
                     javacCommand='/opt/jdk-18-ea-28/bin/javac',
                     javaCommand='/opt/jdk-18-ea-28/bin/java --add-modules 
jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC 
-Djava.io.tmpdir=/l/tmp')
     comp.competitor('new-mmap', checkoutNewMMap, index=index,
                     javacCommand='/opt/jdk-18-ea-28/bin/javac',
                     javaCommand='/opt/jdk-18-ea-28/bin/java --add-modules 
jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC 
-Djava.io.tmpdir=/l/tmp')
     comp.benchmark('new-mmap')
   ```
   
   I set my `JAVA_HOME` to JDK 17 (`17.0.1+12-LTS-39`) and `RUNTIME_JAVA_HOME` 
to JDK 18-ea-b28 (`18-ea+28-1975`).  I used git commit 
`119c7c29ae697a52c91116f2414f973509830267` from Lucene `main`, and then 
@uschindler's branch behind this PR.
   
   Here's the results after 20 JVM iterations:
   
   ```
                               Task    QPS base      StdDevQPS new-mmap      
StdDev                Pct diff p-value
              BrowseMonthSSDVFacets        8.07     (12.6%)        7.18     
(13.4%)  -11.0% ( -32% -   17%) 0.008
              BrowseMonthTaxoFacets        4.67      (5.7%)        4.33      
(2.6%)   -7.2% ( -14% -    1%) 0.000
        BrowseRandomLabelSSDVFacets        5.34      (6.6%)        5.08      
(6.4%)   -4.9% ( -16% -    8%) 0.017
                             IntNRQ       49.91      (7.0%)       48.07      
(2.3%)   -3.7% ( -12% -    6%) 0.026
                           PKLookup      126.62      (4.6%)      122.06      
(3.4%)   -3.6% ( -11% -    4%) 0.005
          BrowseDayOfYearSSDVFacets        7.46     (12.8%)        7.28     
(16.8%)   -2.5% ( -28% -   31%) 0.598
                            Respell       25.49      (1.1%)       24.97      
(1.2%)   -2.1% (  -4% -    0%) 0.000
                             Fuzzy1       40.18      (1.5%)       39.52      
(1.4%)   -1.7% (  -4% -    1%) 0.000
                             Fuzzy2       31.18      (1.8%)       30.67      
(1.5%)   -1.6% (  -4% -    1%) 0.002
                   HighSloppyPhrase       19.11      (5.7%)       18.99      
(5.2%)   -0.6% ( -10% -   10%) 0.710
                           Wildcard       59.01      (6.8%)       58.89      
(6.9%)   -0.2% ( -13% -   14%) 0.926
                    LowSloppyPhrase       14.92      (3.7%)       14.92      
(3.4%)    0.0% (  -6% -    7%) 0.978
                    MedSloppyPhrase      117.00      (3.7%)      117.28      
(3.2%)    0.2% (  -6% -    7%) 0.829
               MedTermDayTaxoFacets       22.39      (3.3%)       22.51      
(4.2%)    0.5% (  -6% -    8%) 0.649
                            Prefix3       62.59      (5.3%)       62.99      
(5.8%)    0.6% (  -9% -   12%) 0.713
        BrowseRandomLabelTaxoFacets        3.93      (3.9%)        3.95      
(6.3%)    0.7% (  -9% -   11%) 0.669
                            LowTerm      678.95      (3.2%)      684.44      
(4.4%)    0.8% (  -6% -    8%) 0.505
                          OrHighMed       61.65      (2.9%)       62.22      
(2.1%)    0.9% (  -3% -    6%) 0.252
           AndHighHighDayTaxoFacets        5.64      (4.5%)        5.70      
(4.1%)    1.0% (  -7% -   10%) 0.450
                         OrHighHigh       16.45      (3.1%)       16.63      
(2.3%)    1.1% (  -4% -    6%) 0.220
                          MedPhrase      157.72      (2.1%)      159.52      
(2.5%)    1.1% (  -3% -    5%) 0.117
                         HighPhrase      110.71      (3.9%)      112.10      
(2.7%)    1.3% (  -5% -    8%) 0.237
                          OrHighLow      270.14      (3.2%)      274.07      
(3.0%)    1.5% (  -4% -    7%) 0.135
               HighTermTitleBDVSort        7.37      (3.7%)        7.49      
(3.2%)    1.5% (  -5% -    8%) 0.170
                        AndHighHigh       44.95      (5.4%)       45.63      
(4.6%)    1.5% (  -7% -   12%) 0.336
                       HighSpanNear        7.27      (6.4%)        7.39      
(5.2%)    1.6% (  -9% -   14%) 0.390
          BrowseDayOfYearTaxoFacets        4.37      (7.5%)        4.45      
(9.8%)    1.8% ( -14% -   20%) 0.512
            AndHighMedDayTaxoFacets       63.88      (2.6%)       65.05      
(1.3%)    1.8% (  -2% -    5%) 0.005
               BrowseDateTaxoFacets        4.37      (7.6%)        4.45     
(10.0%)    1.8% ( -14% -   20%) 0.513
                         TermDTSort      379.61      (2.6%)      386.94      
(2.2%)    1.9% (  -2% -    6%) 0.011
             OrHighMedDayTaxoFacets        5.48      (3.4%)        5.59      
(4.5%)    2.0% (  -5% -   10%) 0.113
                        MedSpanNear        3.79      (2.3%)        3.86      
(3.7%)    2.0% (  -3% -    8%) 0.042
              HighTermDayOfYearSort     1151.05      (4.4%)     1174.57      
(6.2%)    2.0% (  -8% -   13%) 0.227
                         AndHighMed       56.38      (5.3%)       57.64      
(5.9%)    2.2% (  -8% -   14%) 0.208
                           HighTerm      976.99      (6.7%)     1002.21      
(6.8%)    2.6% ( -10% -   17%) 0.225
                LowIntervalsOrdered       12.43      (4.8%)       12.77      
(5.2%)    2.8% (  -6% -   13%) 0.079
                        LowSpanNear        9.60      (2.4%)        9.87      
(1.4%)    2.8% (   0% -    6%) 0.000
                       OrHighNotMed      598.12      (4.1%)      614.79      
(4.2%)    2.8% (  -5% -   11%) 0.034
                  HighTermMonthSort       42.77     (14.2%)       44.03     
(19.5%)    3.0% ( -26% -   42%) 0.584
                MedIntervalsOrdered       29.73      (4.0%)       30.68      
(4.5%)    3.2% (  -5% -   12%) 0.017
                      OrNotHighHigh      555.82      (3.9%)      573.67      
(4.3%)    3.2% (  -4% -   11%) 0.013
               HighIntervalsOrdered        4.36      (6.5%)        4.50      
(5.9%)    3.3% (  -8% -   16%) 0.094
                       OrHighNotLow      699.58      (5.0%)      723.40      
(5.0%)    3.4% (  -6% -   14%) 0.031
                       OrNotHighMed      511.29      (3.9%)      529.02      
(3.6%)    3.5% (  -3% -   11%) 0.004
                       OrNotHighLow      419.51      (3.9%)      434.62      
(2.6%)    3.6% (  -2% -   10%) 0.000
                          LowPhrase      241.42      (3.2%)      250.97      
(2.1%)    4.0% (  -1% -    9%) 0.000
                      OrHighNotHigh      562.96      (3.9%)      585.87      
(3.9%)    4.1% (  -3% -   12%) 0.001
                         AndHighLow      293.83      (5.5%)      306.09      
(1.8%)    4.2% (  -2% -   12%) 0.001
                            MedTerm     1022.47      (6.6%)     1066.29      
(4.4%)    4.3% (  -6% -   16%) 0.015
   ```
   
   SSDV and Taxo facets maybe got a bit slower, and lots of queries got a bit 
faster.
   
   This was the merged CPU profile results for this new `mmap` impl:
   
   ```
   PROFILE SUMMARY from 894683 events (total: 894683)
     tests.profile.mode=cpu
     tests.profile.count=30
     tests.profile.stacksize=1
     tests.profile.linenumbers=false
   PERCENT       CPU SAMPLES   STACK
   4.27%         38211         
org.apache.lucene.index.SingletonSortedNumericDocValues#nextDoc()
   4.15%         37164         
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
   3.56%         31835         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
   2.93%         26214         
org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get()
   2.87%         25641         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$4#longValue()
   2.47%         22090         
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
   2.43%         21784         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
   2.17%         19392         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$3#longValue()
   2.10%         18801         org.apache.lucene.search.ConjunctionDISI#doNext()
   2.10%         18781         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
   1.97%         17597         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
   1.93%         17238         
jdk.internal.foreign.AbstractMemorySegmentImpl#checkBoundsSmall()
   1.85%         16576         
jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
   1.81%         16231         
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
   1.74%         15561         
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
   1.73%         15498         
org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl#readByte()
   1.53%         13721         
jdk.internal.misc.ScopedMemoryAccess#getIntUnalignedInternal()
   1.49%         13317         
jdk.internal.foreign.AbstractMemorySegmentImpl#isSet()
   1.38%         12362         
org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
   1.34%         12016         
org.apache.lucene.queries.spans.TermSpans#nextStartPosition()
   1.16%         10395         org.apache.lucene.search.TermScorer#score()
   1.16%         10338         
jdk.internal.foreign.AbstractMemorySegmentImpl#checkBounds()
   1.10%         9856          
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
   1.01%         9014          
org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
   0.96%         8580          
jdk.internal.foreign.SharedScope#checkValidState()
   0.93%         8349          
org.apache.lucene.index.SingletonSortedSetDocValues#getValueCount()
   0.90%         8020          
org.apache.lucene.search.ScoreCachingWrappingScorer#score()
   0.86%         7654          
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$DenseNumericDocValues#advance()
   0.82%         7361          
org.apache.lucene.queries.spans.SpanScorer#setFreqCurrentDoc()
   0.82%         7328          
org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
   ```
   
   versus baseline CPU JFR profiler results:
   
   ```
   PROFILE SUMMARY from 894453 events (total: 894453)
     tests.profile.mode=cpu
     tests.profile.count=30
     tests.profile.stacksize=1
     tests.profile.linenumbers=false
   PERCENT       CPU SAMPLES   STACK
   5.93%         53070         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
   4.26%         38078         
org.apache.lucene.index.SingletonSortedNumericDocValues#nextDoc()
   3.84%         34318         
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
   3.65%         32685         jdk.internal.misc.Unsafe#convEndian()
   2.86%         25554         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$4#longValue()
   2.74%         24483         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
   2.64%         23617         
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
   2.18%         19515         org.apache.lucene.search.ConjunctionDISI#doNext()
   2.17%         19373         
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
   2.12%         18958         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$3#longValue()
   1.93%         17298         
org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get()
   1.93%         17258         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
   1.82%         16284         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
   1.75%         15647         org.apache.lucene.search.TermScorer#score()
   1.71%         15292         
org.apache.lucene.codecs.lucene90.ForUtil#expand8()
   1.67%         14979         
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
   1.65%         14744         
org.apache.lucene.store.ByteBufferGuard#ensureValid()
   1.57%         14061         
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
   1.15%         10247         
org.apache.lucene.queries.spans.TermSpans#nextStartPosition()
   1.14%         10222         java.util.Objects#checkIndex()
   1.12%         9990          java.nio.Buffer#scope()
   1.06%         9459          org.apache.lucene.store.ByteBufferGuard#getByte()
   0.98%         8724          
org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
   0.91%         8179          
org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
   0.88%         7906          
org.apache.lucene.search.ScoreCachingWrappingScorer#score()
   0.88%         7867          
org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
   0.87%         7823          
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$DenseNumericDocValues#advance()
   0.87%         7789          org.apache.lucene.store.ByteBufferGuard#getInt()
   0.84%         7518          
org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
   0.74%         6639          
org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
   ```
   
   It's curious how costly `SingletonSortedNumericDocValues#nextDoc` is.  I 
think these facet fields are dense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to