shubhamvishu commented on PR #12868: URL: https://github.com/apache/lucene/pull/12868#issuecomment-1838068402
So I ran the `luceneutil` benchmarks with `-idFieldPostingsFormat BloomFilter` but it was failing as there was no delegate posting format and it wasn't able to find the right postings format class using SPI. I tweaked [this code](https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/Indexer.java#L458-L462) (pasted below) to use the `BloomFilteringPostingsFormat` for the id field and also use the codecs jar ([similar to how its done for core](https://github.com/mikemccand/luceneutil/blob/master/src/python/benchUtil.py#L1612-L1621)) and then all worked. ```java public PostingsFormat getPostingsFormatForField(String field) { PostingsFormat pf = PostingsFormat.forName(defaultPostingsFormat); if (field.equals("id")) { return new BloomFilteringPostingsFormat(pf); } return pf; } ``` Below are the `wikimediumall` benchmark results(ran twice to get more confidence) which shows ~**7-9%** performance improvement for `PKLookup` with p-value of `0.000` **Run # 1** ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value BrowseRandomLabelSSDVFacets 2.77 (7.0%) 2.72 (5.1%) -1.7% ( -12% - 11%) 0.374 BrowseRandomLabelTaxoFacets 3.35 (21.8%) 3.30 (18.1%) -1.6% ( -34% - 48%) 0.803 LowSloppyPhrase 6.11 (2.9%) 6.05 (2.5%) -1.0% ( -6% - 4%) 0.248 HighTermTitleSort 128.04 (3.7%) 126.78 (3.1%) -1.0% ( -7% - 6%) 0.365 HighSloppyPhrase 13.43 (2.7%) 13.31 (2.6%) -0.9% ( -6% - 4%) 0.254 MedSloppyPhrase 4.72 (3.4%) 4.68 (2.6%) -0.8% ( -6% - 5%) 0.393 OrHighMedDayTaxoFacets 2.99 (5.2%) 2.97 (3.5%) -0.8% ( -9% - 8%) 0.582 BrowseDateTaxoFacets 3.85 (19.0%) 3.82 (17.7%) -0.8% ( -31% - 44%) 0.896 BrowseDayOfYearTaxoFacets 3.86 (19.0%) 3.83 (17.9%) -0.7% ( -31% - 44%) 0.900 OrHighMed 72.86 (2.0%) 72.58 (2.4%) -0.4% ( -4% - 4%) 0.586 OrHighHigh 20.04 (3.1%) 19.98 (4.3%) -0.3% ( -7% - 7%) 0.801 HighPhrase 24.66 (6.2%) 24.59 (6.6%) -0.3% ( -12% - 13%) 0.882 Prefix3 72.76 (5.0%) 72.59 (4.3%) -0.2% ( -9% - 9%) 0.873 MedTerm 379.26 (3.1%) 378.51 (5.1%) -0.2% ( -8% - 8%) 0.882 MedPhrase 18.99 (5.5%) 18.95 (5.9%) -0.2% ( -10% - 11%) 0.915 BrowseDateSSDVFacets 0.90 (7.7%) 0.89 (7.4%) -0.2% ( -14% - 16%) 0.944 LowPhrase 63.34 (2.4%) 63.27 (2.8%) -0.1% ( -5% - 5%) 0.887 HighSpanNear 6.18 (3.1%) 6.18 (3.2%) 0.0% ( -6% - 6%) 0.993 Fuzzy1 64.49 (1.1%) 64.52 (1.4%) 0.0% ( -2% - 2%) 0.919 LowSpanNear 15.60 (2.3%) 15.61 (2.5%) 0.1% ( -4% - 5%) 0.916 HighTermTitleBDVSort 4.90 (4.2%) 4.90 (3.8%) 0.1% ( -7% - 8%) 0.938 MedTermDayTaxoFacets 9.26 (5.6%) 9.27 (4.1%) 0.1% ( -9% - 10%) 0.948 AndHighMedDayTaxoFacets 13.99 (1.7%) 14.02 (1.5%) 0.2% ( -3% - 3%) 0.764 AndHighHighDayTaxoFacets 5.26 (2.7%) 5.27 (2.4%) 0.2% ( -4% - 5%) 0.819 Respell 29.96 (1.1%) 30.03 (1.8%) 0.2% ( -2% - 3%) 0.656 LowTerm 398.65 (2.5%) 399.67 (2.9%) 0.3% ( -5% - 5%) 0.765 MedSpanNear 38.84 (2.6%) 38.96 (3.0%) 0.3% ( -5% - 6%) 0.729 Wildcard 60.70 (1.6%) 60.89 (1.1%) 0.3% ( -2% - 3%) 0.463 HighTermDayOfYearSort 200.04 (2.5%) 200.70 (3.7%) 0.3% ( -5% - 6%) 0.740 OrHighLow 252.74 (2.0%) 253.62 (2.4%) 0.3% ( -3% - 4%) 0.613 OrNotHighHigh 157.04 (4.6%) 157.63 (4.2%) 0.4% ( -8% - 9%) 0.789 AndHighHigh 29.31 (2.4%) 29.42 (3.5%) 0.4% ( -5% - 6%) 0.678 OrNotHighLow 290.56 (2.1%) 291.81 (1.7%) 0.4% ( -3% - 4%) 0.475 OrNotHighMed 221.77 (3.5%) 222.84 (2.9%) 0.5% ( -5% - 7%) 0.633 OrHighNotHigh 167.01 (4.8%) 167.85 (4.5%) 0.5% ( -8% - 10%) 0.731 HighTerm 279.21 (4.3%) 280.66 (6.5%) 0.5% ( -9% - 11%) 0.767 AndHighLow 374.84 (1.9%) 377.10 (1.8%) 0.6% ( -3% - 4%) 0.308 HighTermMonthSort 2378.06 (3.6%) 2392.49 (4.1%) 0.6% ( -6% - 8%) 0.618 LowIntervalsOrdered 12.78 (2.4%) 12.86 (2.8%) 0.6% ( -4% - 6%) 0.443 OrHighNotLow 269.59 (4.8%) 271.34 (4.9%) 0.6% ( -8% - 10%) 0.672 IntNRQ 18.20 (5.9%) 18.32 (5.4%) 0.7% ( -10% - 12%) 0.709 AndHighMed 37.67 (2.5%) 37.93 (3.5%) 0.7% ( -5% - 6%) 0.459 MedIntervalsOrdered 1.80 (3.5%) 1.82 (3.7%) 0.8% ( -6% - 8%) 0.456 OrHighNotMed 248.23 (4.7%) 250.34 (4.5%) 0.9% ( -7% - 10%) 0.556 Fuzzy2 35.10 (1.2%) 35.42 (1.2%) 0.9% ( -1% - 3%) 0.016 BrowseMonthTaxoFacets 4.13 (30.7%) 4.17 (34.8%) 1.0% ( -49% - 96%) 0.924 BrowseMonthSSDVFacets 4.37 (9.6%) 4.45 (8.9%) 1.9% ( -15% - 22%) 0.506 HighIntervalsOrdered 1.58 (4.6%) 1.61 (5.6%) 2.0% ( -7% - 12%) 0.228 TermDTSort 96.97 (3.2%) 98.92 (5.0%) 2.0% ( -6% - 10%) 0.132 BrowseDayOfYearSSDVFacets 3.76 (9.6%) 3.85 (6.5%) 2.3% ( -12% - 20%) 0.366 PKLookup 106.69 (1.5%) 114.71 (1.5%) 7.5% ( 4% - 10%) 0.000 ``` **Run # 2** ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value BrowseMonthTaxoFacets 4.08 (31.7%) 3.80 (1.8%) -6.7% ( -30% - 39%) 0.342 BrowseRandomLabelTaxoFacets 3.30 (21.9%) 3.16 (2.3%) -4.2% ( -23% - 25%) 0.393 BrowseDateTaxoFacets 3.85 (20.0%) 3.72 (6.1%) -3.5% ( -24% - 28%) 0.456 BrowseDayOfYearTaxoFacets 3.86 (20.0%) 3.73 (6.2%) -3.3% ( -24% - 28%) 0.476 OrHighHigh 26.35 (6.9%) 25.88 (6.8%) -1.8% ( -14% - 12%) 0.404 HighTermDayOfYearSort 209.99 (3.7%) 206.48 (4.1%) -1.7% ( -9% - 6%) 0.177 OrHighLow 273.41 (2.3%) 270.80 (3.0%) -1.0% ( -6% - 4%) 0.252 HighTermMonthSort 2346.36 (3.2%) 2326.36 (3.6%) -0.9% ( -7% - 6%) 0.427 HighTermTitleBDVSort 4.83 (3.7%) 4.80 (3.4%) -0.7% ( -7% - 6%) 0.522 Prefix3 584.55 (3.3%) 580.95 (3.1%) -0.6% ( -6% - 5%) 0.541 OrHighMed 78.07 (2.7%) 77.60 (2.9%) -0.6% ( -6% - 5%) 0.489 TermDTSort 92.32 (4.1%) 91.89 (4.4%) -0.5% ( -8% - 8%) 0.732 HighTermTitleSort 138.64 (2.4%) 138.05 (2.5%) -0.4% ( -5% - 4%) 0.580 AndHighHigh 20.00 (5.1%) 19.93 (4.3%) -0.4% ( -9% - 9%) 0.791 BrowseDateSSDVFacets 0.90 (9.5%) 0.90 (8.0%) -0.4% ( -16% - 18%) 0.891 Fuzzy2 39.56 (1.2%) 39.41 (1.4%) -0.4% ( -2% - 2%) 0.362 AndHighHighDayTaxoFacets 2.08 (4.1%) 2.07 (4.0%) -0.3% ( -8% - 8%) 0.798 Fuzzy1 66.33 (1.0%) 66.17 (1.1%) -0.2% ( -2% - 1%) 0.471 Wildcard 40.31 (3.9%) 40.22 (3.7%) -0.2% ( -7% - 7%) 0.856 HighSloppyPhrase 11.01 (1.8%) 11.00 (2.2%) -0.1% ( -4% - 3%) 0.852 OrHighNotLow 219.55 (7.5%) 219.38 (7.3%) -0.1% ( -13% - 15%) 0.974 Respell 50.51 (1.6%) 50.48 (1.5%) -0.1% ( -3% - 3%) 0.915 IntNRQ 18.46 (8.9%) 18.46 (9.1%) -0.0% ( -16% - 19%) 0.988 AndHighMed 82.85 (3.1%) 82.84 (2.5%) -0.0% ( -5% - 5%) 0.982 OrNotHighLow 512.93 (2.0%) 512.86 (1.9%) -0.0% ( -3% - 3%) 0.982 LowSpanNear 64.37 (2.4%) 64.44 (2.7%) 0.1% ( -4% - 5%) 0.886 OrNotHighHigh 278.80 (6.4%) 279.28 (6.0%) 0.2% ( -11% - 13%) 0.931 LowTerm 351.93 (4.1%) 352.53 (4.4%) 0.2% ( -7% - 9%) 0.898 OrNotHighMed 201.78 (5.3%) 202.14 (5.1%) 0.2% ( -9% - 11%) 0.913 OrHighNotHigh 196.39 (6.5%) 196.74 (6.5%) 0.2% ( -11% - 14%) 0.930 LowSloppyPhrase 4.06 (4.1%) 4.07 (4.6%) 0.2% ( -8% - 9%) 0.865 AndHighMedDayTaxoFacets 29.95 (1.5%) 30.04 (1.7%) 0.3% ( -2% - 3%) 0.577 OrHighMedDayTaxoFacets 3.47 (5.7%) 3.48 (4.3%) 0.3% ( -9% - 10%) 0.857 MedIntervalsOrdered 7.68 (6.1%) 7.71 (6.6%) 0.4% ( -11% - 13%) 0.858 MedTerm 462.78 (5.3%) 464.47 (6.6%) 0.4% ( -10% - 12%) 0.847 AndHighLow 274.17 (2.2%) 275.22 (2.5%) 0.4% ( -4% - 5%) 0.606 HighSpanNear 3.88 (3.9%) 3.90 (4.8%) 0.5% ( -7% - 9%) 0.738 BrowseDayOfYearSSDVFacets 3.60 (8.3%) 3.62 (9.9%) 0.5% ( -16% - 20%) 0.863 OrHighNotMed 286.50 (6.4%) 287.93 (6.3%) 0.5% ( -11% - 14%) 0.803 LowIntervalsOrdered 4.82 (3.9%) 4.85 (3.8%) 0.5% ( -6% - 8%) 0.678 MedSpanNear 4.81 (3.1%) 4.84 (4.0%) 0.6% ( -6% - 7%) 0.616 HighIntervalsOrdered 4.37 (4.8%) 4.40 (5.0%) 0.6% ( -8% - 10%) 0.700 MedTermDayTaxoFacets 11.79 (2.7%) 11.86 (3.1%) 0.6% ( -5% - 6%) 0.507 MedSloppyPhrase 36.65 (5.2%) 36.95 (4.6%) 0.8% ( -8% - 11%) 0.592 HighTerm 318.60 (6.5%) 322.10 (7.5%) 1.1% ( -12% - 16%) 0.621 LowPhrase 31.99 (4.0%) 32.35 (2.0%) 1.1% ( -4% - 7%) 0.255 BrowseMonthSSDVFacets 4.35 (13.5%) 4.41 (14.8%) 1.3% ( -23% - 34%) 0.771 MedPhrase 54.81 (3.5%) 55.56 (2.4%) 1.4% ( -4% - 7%) 0.147 BrowseRandomLabelSSDVFacets 2.68 (8.5%) 2.73 (8.2%) 1.6% ( -13% - 19%) 0.556 HighPhrase 3.07 (6.6%) 3.15 (4.3%) 2.5% ( -7% - 14%) 0.150 PKLookup 105.96 (1.1%) 115.03 (1.4%) 8.6% ( 5% - 11%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
