Hi folks- Egor Potemkin and I have been digging into a baffling performance regression we're seeing in response to a one-line change that doesn't rationally seem like it should have any performance impact what-so-ever. There's more background on why we're trying to understand this, but I'll save the broader context for now and just focus on the confusing issue we're trying to understand.
Inside IndexSearcher, we've staged a change that initializes an ArrayList of Collectors slightly earlier than what we do today (see: https://github.com/apache/lucene/pull/13657/files). We end up with code that looks like this (note the isolated line that's initializing `collectors`): ``` public <C extends Collector, T> T search(Query query, CollectorManager<C, T> collectorManager) throws IOException { final LeafSlice[] leafSlices = getSlices(); final C firstCollector = collectorManager.newCollector(); query = rewrite(query, firstCollector.scoreMode().needsScores()); final Weight weight = createWeight(query, firstCollector.scoreMode(), 1); final List<C> collectors = new ArrayList<>(leafSlices.length); return search(weight, collectorManager, firstCollector, collectors, leafSlices); } ``` What's baffling is that if we initialize the `collectors` list _after_ the call to `createWeight` (as shown here), there's no performance impact at all (as expected). But if all we do is initialize `collectors` _before_ the call to `createWeight`, we see a very significant regression on LowTerm, MedTerm, HighTerm tasks in luceneutil (e.g., %15 - 30%). At the other end, we see a significant improvement to OrHighNotLow, OrHighNotMed, OrHighNotHigh (e.g., 7% - 15%). (This is running wikimedium10m on an x86-based AWS ec2 host, but results reproduced separately for Egor and in our nightly benchmark runs; full luceneutil output at the bottom of this email [1]). Some additional context and conversation is captured in this "demo" PR: https://github.com/apache/lucene/pull/13657. My only hunch here is this has something to do with hotspot's decision making or some other such runtime optimization, but I'm getting out of my depth and hoping someone in this community will have ideas on ways to continue this investigation. Anyone have a clue what might be going on? Or any suggestions on other things to look at? This isn't a purely academic exercise for what it's worth. This oddity has caused us to duplicate some code in IndexSearcher to work with a new sandbox faceting module, so it would be nice to figure this out so we can remove the code duplication. (The code duplication is pretty minor, but it's still really frustrating and it's a trap waiting to be hit by someone in the future that tries to consolidate the code duplication and runs into this) Thanks for reading, and thanks in advance for any ideas! Cheers, -Greg [1] Full Lucene util output: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value MedTerm 513.21 (4.9%) 369.43 (4.8%) -28.0% ( -35% - -19%) 0.000 HighTerm 523.20 (6.9%) 402.11 (5.0%) -23.1% ( -32% - -12%) 0.000 LowTerm 837.70 (3.9%) 715.94 (3.9%) -14.5% ( -21% - -6%) 0.000 BrowseDayOfYearSSDVFacets 11.97 (18.9%) 11.31 (11.9%) -5.5% ( -30% - 31%) 0.273 MedTermDayTaxoFacets 23.03 (4.9%) 21.95 (6.4%) -4.7% ( -15% - 6%) 0.009 HighPhrase 143.93 (8.3%) 139.35 (4.7%) -3.2% ( -14% - 10%) 0.136 Fuzzy2 53.03 (9.0%) 51.50 (7.3%) -2.9% ( -17% - 14%) 0.265 MedSpanNear 50.70 (5.1%) 49.26 (3.0%) -2.8% ( -10% - 5%) 0.032 LowPhrase 70.38 (4.9%) 68.60 (5.3%) -2.5% ( -12% - 8%) 0.118 MedPhrase 88.15 (5.2%) 86.03 (4.2%) -2.4% ( -11% - 7%) 0.105 OrHighMedDayTaxoFacets 7.01 (5.5%) 6.86 (5.4%) -2.0% ( -12% - 9%) 0.237 HighSpanNear 28.95 (2.7%) 28.42 (2.9%) -1.8% ( -7% - 3%) 0.043 MedSloppyPhrase 201.71 (3.3%) 198.58 (3.1%) -1.6% ( -7% - 4%) 0.124 BrowseDateTaxoFacets 23.97 (28.7%) 23.62 (22.8%) -1.5% ( -41% - 70%) 0.858 AndHighMedDayTaxoFacets 32.81 (5.8%) 32.35 (7.1%) -1.4% ( -13% - 12%) 0.493 AndHighHighDayTaxoFacets 27.86 (6.1%) 27.50 (6.5%) -1.3% ( -13% - 12%) 0.507 LowSloppyPhrase 149.20 (2.9%) 147.50 (3.0%) -1.1% ( -6% - 4%) 0.227 HighTermTitleBDVSort 66.72 (6.6%) 66.04 (5.7%) -1.0% ( -12% - 12%) 0.604 AndHighHigh 187.45 (7.4%) 185.75 (6.7%) -0.9% ( -13% - 14%) 0.684 LowSpanNear 102.21 (2.1%) 101.50 (1.5%) -0.7% ( -4% - 3%) 0.242 OrHighHigh 218.06 (6.3%) 216.74 (4.1%) -0.6% ( -10% - 10%) 0.721 HighTermTitleSort 132.14 (1.5%) 131.93 (1.3%) -0.2% ( -2% - 2%) 0.724 HighSloppyPhrase 31.43 (5.4%) 31.39 (6.6%) -0.1% ( -11% - 12%) 0.949 BrowseRandomLabelSSDVFacets 7.91 (10.2%) 7.91 (11.2%) -0.0% ( -19% - 23%) 0.992 AndHighMed 288.24 (4.9%) 288.33 (4.0%) 0.0% ( -8% - 9%) 0.982 AndHighLow 1339.09 (3.2%) 1345.87 (4.8%) 0.5% ( -7% - 8%) 0.694 OrHighMed 473.22 (3.9%) 476.21 (3.8%) 0.6% ( -6% - 8%) 0.603 BrowseDayOfYearTaxoFacets 23.67 (28.7%) 23.82 (23.5%) 0.6% ( -40% - 74%) 0.938 HighTermDayOfYearSort 415.29 (5.2%) 418.26 (5.9%) 0.7% ( -9% - 12%) 0.684 BrowseDateSSDVFacets 2.14 (21.4%) 2.16 (22.4%) 1.0% ( -35% - 56%) 0.887 Wildcard 489.21 (4.3%) 494.69 (4.5%) 1.1% ( -7% - 10%) 0.420 TermDTSort 216.56 (5.9%) 219.04 (4.8%) 1.1% ( -8% - 12%) 0.499 PKLookup 139.24 (8.7%) 140.89 (10.8%) 1.2% ( -16% - 22%) 0.703 Fuzzy1 74.44 (9.7%) 75.42 (8.3%) 1.3% ( -15% - 21%) 0.643 Respell 48.52 (7.2%) 49.20 (6.6%) 1.4% ( -11% - 16%) 0.519 OrNotHighLow 1260.39 (3.0%) 1279.03 (2.7%) 1.5% ( -4% - 7%) 0.101 MedIntervalsOrdered 132.03 (9.2%) 134.25 (12.6%) 1.7% ( -18% - 25%) 0.630 BrowseMonthTaxoFacets 24.51 (26.9%) 25.02 (25.5%) 2.0% ( -39% - 74%) 0.804 HighTermMonthSort 1117.15 (4.1%) 1143.38 (4.6%) 2.3% ( -6% - 11%) 0.090 BrowseRandomLabelTaxoFacets 15.54 (25.0%) 15.93 (19.7%) 2.5% ( -33% - 62%) 0.724 Prefix3 667.73 (11.1%) 684.51 (11.1%) 2.5% ( -17% - 27%) 0.474 LowIntervalsOrdered 118.38 (14.5%) 121.55 (14.8%) 2.7% ( -23% - 37%) 0.564 HighIntervalsOrdered 30.52 (9.2%) 31.34 (7.0%) 2.7% ( -12% - 20%) 0.298 OrNotHighMed 365.66 (5.9%) 376.73 (6.1%) 3.0% ( -8% - 15%) 0.110 OrHighLow 586.67 (5.7%) 608.48 (5.6%) 3.7% ( -7% - 15%) 0.037 OrNotHighHigh 257.09 (5.8%) 267.66 (6.5%) 4.1% ( -7% - 17%) 0.034 BrowseMonthSSDVFacets 11.21 (9.1%) 11.69 (7.1%) 4.3% ( -11% - 22%) 0.100 OrHighNotLow 446.78 (8.7%) 479.82 (7.1%) 7.4% ( -7% - 25%) 0.003 OrHighNotMed 591.66 (7.6%) 649.35 (4.8%) 9.8% ( -2% - 23%) 0.000 IntNRQ 202.12 (17.5%) 224.77 (28.1%) 11.2% ( -29% - 68%) 0.130 OrHighNotHigh 339.78 (8.0%) 393.02 (6.7%) 15.7% ( 0% - 33%) 0.000 ```