alexmm-amzn opened a new pull request, #15210:
URL: https://github.com/apache/lucene/pull/15210

   ### Description
   
   Extends the `FirstPassGroupingCollector` to support pruning (for numeric 
sort fields using `competitiveIterator`) and skipping of non-competitive 
documents (for relevance score sorting using `Scorable#setMinCompetitiveScore`).
   
   Both optimizations are enabled automatically, thereby reducing the hit count 
of the collector if circumstances allow.
   
   @jainankitk Are we fine with enabling this by default, or do we need this 
configurable (e.g. configurable hit threshold)?
   
   Benchmark results using `luceneutils` for the `TermBGroup1M` scenario 
(combines first and second pass grouping) using a modified 
`wikimedium.10M.nostopwords.tasks` job. This scenario uses sort by relevance 
score.
   
   ```
   > grep TermBGroup1M tasks/wikimedium500.tasks > 
tasks/wikimedium.10M.nostopwords.tasks
   > python src/python/localrun.py -source wikimediumall 
   ```
   
   Running on `m6a.2xlarge` using Corretto 24:
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                           PKLookup      161.09     (12.8%)      156.56     
(11.2%)   -2.8% ( -23% -   24%) 0.460
                       TermBGroup1M       11.47     (14.8%)       13.44     
(13.7%)   17.1% (  -9% -   53%) 0.000
   ```
   
   => ~17% overall performance improvement (first+second pass).
   
   @jpountz I'm getting some rare test failures for `TestGrouping` caused by 
the `assert canSetMinCompetitiveScore` assertion in 
`AssertingScorer#setMinCompetitiveScore`, even though the 
`FirstPassGroupingCollector` uses `ScoreMode.TOP_SCORES` in all configurations 
when it calls `Scorable#setMinCompetitiveScore`. Is this a known issue?
   
   Reproduce with: `gradlew test --tests TestGrouping.testRandom 
-Dtests.seed=EC2EC279F564DD82 -Dtests.locale=de-AT 
-Dtests.timezone=America/St_Thomas -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to