mikemccand commented on pull request #518: URL: https://github.com/apache/lucene/pull/518#issuecomment-999027826
OK, thank you @uschindler and @rmuir for helping me debug the tricky setup! I ran this `perf.py` using `luceneutil`: ``` import sys sys.path.insert(0, '/l/util/src/python') import competition if __name__ == '__main__': sourceData = competition.sourceData() comp = competition.Competition() checkout = 'trunk' checkoutNewMMap = 'trunk-new-mmap' index = comp.newIndex(checkout, sourceData, numThreads=12, addDVFields=True, verbose=True, grouping=False, useCMS=True, javaCommand='/opt/jdk-18-ea-28/bin/java --add-modules jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC -Djava.io.tmpdir=/l/tmp', analyzer = 'StandardAnalyzerNoStopWords', facets = (('taxonomy:Date', 'Date'), ('taxonomy:Month', 'Month'), ('taxonomy:DayOfYear', 'DayOfYear'), ('taxonomy:RandomLabel.taxonomy', 'RandomLabel'), ('sortedset:Month', 'Month'), ('sortedset:DayOfYear', 'DayOfYear'), ('sortedset:RandomLabel.sortedset', 'RandomLabel'))) comp.competitor('base', checkout, index=index, javacCommand='/opt/jdk-18-ea-28/bin/javac', javaCommand='/opt/jdk-18-ea-28/bin/java --add-modules jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC -Djava.io.tmpdir=/l/tmp') comp.competitor('new-mmap', checkoutNewMMap, index=index, javacCommand='/opt/jdk-18-ea-28/bin/javac', javaCommand='/opt/jdk-18-ea-28/bin/java --add-modules jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC -Djava.io.tmpdir=/l/tmp') comp.benchmark('new-mmap') ``` I set my `JAVA_HOME` to JDK 17 (`17.0.1+12-LTS-39`) and `RUNTIME_JAVA_HOME` to JDK 18-ea-b28 (`18-ea+28-1975`). I used git commit `119c7c29ae697a52c91116f2414f973509830267` from Lucene `main`, and then @uschindler's branch behind this PR. Here's the results after 20 JVM iterations: ``` Task QPS base StdDevQPS new-mmap StdDev Pct diff p-value BrowseMonthSSDVFacets 8.07 (12.6%) 7.18 (13.4%) -11.0% ( -32% - 17%) 0.008 BrowseMonthTaxoFacets 4.67 (5.7%) 4.33 (2.6%) -7.2% ( -14% - 1%) 0.000 BrowseRandomLabelSSDVFacets 5.34 (6.6%) 5.08 (6.4%) -4.9% ( -16% - 8%) 0.017 IntNRQ 49.91 (7.0%) 48.07 (2.3%) -3.7% ( -12% - 6%) 0.026 PKLookup 126.62 (4.6%) 122.06 (3.4%) -3.6% ( -11% - 4%) 0.005 BrowseDayOfYearSSDVFacets 7.46 (12.8%) 7.28 (16.8%) -2.5% ( -28% - 31%) 0.598 Respell 25.49 (1.1%) 24.97 (1.2%) -2.1% ( -4% - 0%) 0.000 Fuzzy1 40.18 (1.5%) 39.52 (1.4%) -1.7% ( -4% - 1%) 0.000 Fuzzy2 31.18 (1.8%) 30.67 (1.5%) -1.6% ( -4% - 1%) 0.002 HighSloppyPhrase 19.11 (5.7%) 18.99 (5.2%) -0.6% ( -10% - 10%) 0.710 Wildcard 59.01 (6.8%) 58.89 (6.9%) -0.2% ( -13% - 14%) 0.926 LowSloppyPhrase 14.92 (3.7%) 14.92 (3.4%) 0.0% ( -6% - 7%) 0.978 MedSloppyPhrase 117.00 (3.7%) 117.28 (3.2%) 0.2% ( -6% - 7%) 0.829 MedTermDayTaxoFacets 22.39 (3.3%) 22.51 (4.2%) 0.5% ( -6% - 8%) 0.649 Prefix3 62.59 (5.3%) 62.99 (5.8%) 0.6% ( -9% - 12%) 0.713 BrowseRandomLabelTaxoFacets 3.93 (3.9%) 3.95 (6.3%) 0.7% ( -9% - 11%) 0.669 LowTerm 678.95 (3.2%) 684.44 (4.4%) 0.8% ( -6% - 8%) 0.505 OrHighMed 61.65 (2.9%) 62.22 (2.1%) 0.9% ( -3% - 6%) 0.252 AndHighHighDayTaxoFacets 5.64 (4.5%) 5.70 (4.1%) 1.0% ( -7% - 10%) 0.450 OrHighHigh 16.45 (3.1%) 16.63 (2.3%) 1.1% ( -4% - 6%) 0.220 MedPhrase 157.72 (2.1%) 159.52 (2.5%) 1.1% ( -3% - 5%) 0.117 HighPhrase 110.71 (3.9%) 112.10 (2.7%) 1.3% ( -5% - 8%) 0.237 OrHighLow 270.14 (3.2%) 274.07 (3.0%) 1.5% ( -4% - 7%) 0.135 HighTermTitleBDVSort 7.37 (3.7%) 7.49 (3.2%) 1.5% ( -5% - 8%) 0.170 AndHighHigh 44.95 (5.4%) 45.63 (4.6%) 1.5% ( -7% - 12%) 0.336 HighSpanNear 7.27 (6.4%) 7.39 (5.2%) 1.6% ( -9% - 14%) 0.390 BrowseDayOfYearTaxoFacets 4.37 (7.5%) 4.45 (9.8%) 1.8% ( -14% - 20%) 0.512 AndHighMedDayTaxoFacets 63.88 (2.6%) 65.05 (1.3%) 1.8% ( -2% - 5%) 0.005 BrowseDateTaxoFacets 4.37 (7.6%) 4.45 (10.0%) 1.8% ( -14% - 20%) 0.513 TermDTSort 379.61 (2.6%) 386.94 (2.2%) 1.9% ( -2% - 6%) 0.011 OrHighMedDayTaxoFacets 5.48 (3.4%) 5.59 (4.5%) 2.0% ( -5% - 10%) 0.113 MedSpanNear 3.79 (2.3%) 3.86 (3.7%) 2.0% ( -3% - 8%) 0.042 HighTermDayOfYearSort 1151.05 (4.4%) 1174.57 (6.2%) 2.0% ( -8% - 13%) 0.227 AndHighMed 56.38 (5.3%) 57.64 (5.9%) 2.2% ( -8% - 14%) 0.208 HighTerm 976.99 (6.7%) 1002.21 (6.8%) 2.6% ( -10% - 17%) 0.225 LowIntervalsOrdered 12.43 (4.8%) 12.77 (5.2%) 2.8% ( -6% - 13%) 0.079 LowSpanNear 9.60 (2.4%) 9.87 (1.4%) 2.8% ( 0% - 6%) 0.000 OrHighNotMed 598.12 (4.1%) 614.79 (4.2%) 2.8% ( -5% - 11%) 0.034 HighTermMonthSort 42.77 (14.2%) 44.03 (19.5%) 3.0% ( -26% - 42%) 0.584 MedIntervalsOrdered 29.73 (4.0%) 30.68 (4.5%) 3.2% ( -5% - 12%) 0.017 OrNotHighHigh 555.82 (3.9%) 573.67 (4.3%) 3.2% ( -4% - 11%) 0.013 HighIntervalsOrdered 4.36 (6.5%) 4.50 (5.9%) 3.3% ( -8% - 16%) 0.094 OrHighNotLow 699.58 (5.0%) 723.40 (5.0%) 3.4% ( -6% - 14%) 0.031 OrNotHighMed 511.29 (3.9%) 529.02 (3.6%) 3.5% ( -3% - 11%) 0.004 OrNotHighLow 419.51 (3.9%) 434.62 (2.6%) 3.6% ( -2% - 10%) 0.000 LowPhrase 241.42 (3.2%) 250.97 (2.1%) 4.0% ( -1% - 9%) 0.000 OrHighNotHigh 562.96 (3.9%) 585.87 (3.9%) 4.1% ( -3% - 12%) 0.001 AndHighLow 293.83 (5.5%) 306.09 (1.8%) 4.2% ( -2% - 12%) 0.001 MedTerm 1022.47 (6.6%) 1066.29 (4.4%) 4.3% ( -6% - 16%) 0.015 ``` SSDV and Taxo facets maybe got a bit slower, and lots of queries got a bit faster. This was the merged CPU profile results for this new `mmap` impl: ``` PROFILE SUMMARY from 894683 events (total: 894683) tests.profile.mode=cpu tests.profile.count=30 tests.profile.stacksize=1 tests.profile.linenumbers=false PERCENT CPU SAMPLES STACK 4.27% 38211 org.apache.lucene.index.SingletonSortedNumericDocValues#nextDoc() 4.15% 37164 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment() 3.56% 31835 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue() 2.93% 26214 org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get() 2.87% 25641 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$4#longValue() 2.47% 22090 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() 2.43% 21784 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition() 2.17% 19392 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$3#longValue() 2.10% 18801 org.apache.lucene.search.ConjunctionDISI#doNext() 2.10% 18781 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance() 1.97% 17597 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions() 1.93% 17238 jdk.internal.foreign.AbstractMemorySegmentImpl#checkBoundsSmall() 1.85% 16576 jdk.internal.misc.ScopedMemoryAccess#getByteInternal() 1.81% 16231 org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder() 1.74% 15561 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval() 1.73% 15498 org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl#readByte() 1.53% 13721 jdk.internal.misc.ScopedMemoryAccess#getIntUnalignedInternal() 1.49% 13317 jdk.internal.foreign.AbstractMemorySegmentImpl#isSet() 1.38% 12362 org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment() 1.34% 12016 org.apache.lucene.queries.spans.TermSpans#nextStartPosition() 1.16% 10395 org.apache.lucene.search.TermScorer#score() 1.16% 10338 jdk.internal.foreign.AbstractMemorySegmentImpl#checkBounds() 1.10% 9856 org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get() 1.01% 9014 org.apache.lucene.queries.intervals.IntervalFilter#nextInterval() 0.96% 8580 jdk.internal.foreign.SharedScope#checkValidState() 0.93% 8349 org.apache.lucene.index.SingletonSortedSetDocValues#getValueCount() 0.90% 8020 org.apache.lucene.search.ScoreCachingWrappingScorer#score() 0.86% 7654 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$DenseNumericDocValues#advance() 0.82% 7361 org.apache.lucene.queries.spans.SpanScorer#setFreqCurrentDoc() 0.82% 7328 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() ``` versus baseline CPU JFR profiler results: ``` PROFILE SUMMARY from 894453 events (total: 894453) tests.profile.mode=cpu tests.profile.count=30 tests.profile.stacksize=1 tests.profile.linenumbers=false PERCENT CPU SAMPLES STACK 5.93% 53070 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue() 4.26% 38078 org.apache.lucene.index.SingletonSortedNumericDocValues#nextDoc() 3.84% 34318 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment() 3.65% 32685 jdk.internal.misc.Unsafe#convEndian() 2.86% 25554 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$4#longValue() 2.74% 24483 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition() 2.64% 23617 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() 2.18% 19515 org.apache.lucene.search.ConjunctionDISI#doNext() 2.17% 19373 org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get() 2.12% 18958 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$3#longValue() 1.93% 17298 org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get() 1.93% 17258 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance() 1.82% 16284 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions() 1.75% 15647 org.apache.lucene.search.TermScorer#score() 1.71% 15292 org.apache.lucene.codecs.lucene90.ForUtil#expand8() 1.67% 14979 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval() 1.65% 14744 org.apache.lucene.store.ByteBufferGuard#ensureValid() 1.57% 14061 org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder() 1.15% 10247 org.apache.lucene.queries.spans.TermSpans#nextStartPosition() 1.14% 10222 java.util.Objects#checkIndex() 1.12% 9990 java.nio.Buffer#scope() 1.06% 9459 org.apache.lucene.store.ByteBufferGuard#getByte() 0.98% 8724 org.apache.lucene.queries.intervals.IntervalFilter#nextInterval() 0.91% 8179 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() 0.88% 7906 org.apache.lucene.search.ScoreCachingWrappingScorer#score() 0.88% 7867 org.apache.lucene.store.ByteBufferIndexInput#buildSlice() 0.87% 7823 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$DenseNumericDocValues#advance() 0.87% 7789 org.apache.lucene.store.ByteBufferGuard#getInt() 0.84% 7518 org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment() 0.74% 6639 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() ``` It's curious how costly `SingletonSortedNumericDocValues#nextDoc` is. I think these facet fields are dense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org