alexmm-amzn opened a new pull request, #15210:
URL: https://github.com/apache/lucene/pull/15210
### Description
Extends the `FirstPassGroupingCollector` to support pruning (for numeric
sort fields using `competitiveIterator`) and skipping of non-competitive
documents (for relevance score sorting using `Scorable#setMinCompetitiveScore`).
Both optimizations are enabled automatically, thereby reducing the hit count
of the collector if circumstances allow.
@jainankitk Are we fine with enabling this by default, or do we need this
configurable (e.g. configurable hit threshold)?
Benchmark results using `luceneutils` for the `TermBGroup1M` scenario
(combines first and second pass grouping) using a modified
`wikimedium.10M.nostopwords.tasks` job. This scenario uses sort by relevance
score.
```
> grep TermBGroup1M tasks/wikimedium500.tasks >
tasks/wikimedium.10M.nostopwords.tasks
> python src/python/localrun.py -source wikimediumall
```
Running on `m6a.2xlarge` using Corretto 24:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
PKLookup 161.09 (12.8%) 156.56
(11.2%) -2.8% ( -23% - 24%) 0.460
TermBGroup1M 11.47 (14.8%) 13.44
(13.7%) 17.1% ( -9% - 53%) 0.000
```
=> ~17% overall performance improvement (first+second pass).
@jpountz I'm getting some rare test failures for `TestGrouping` caused by
the `assert canSetMinCompetitiveScore` assertion in
`AssertingScorer#setMinCompetitiveScore`, even though the
`FirstPassGroupingCollector` uses `ScoreMode.TOP_SCORES` in all configurations
when it calls `Scorable#setMinCompetitiveScore`. Is this a known issue?
Reproduce with: `gradlew test --tests TestGrouping.testRandom
-Dtests.seed=EC2EC279F564DD82 -Dtests.locale=de-AT
-Dtests.timezone=America/St_Thomas -Dtests.asserts=true
-Dtests.file.encoding=UTF-8`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]