shimpeko commented on PR #15659:
URL: https://github.com/apache/lucene/pull/15659#issuecomment-3847147158
I tried to run wikinightly but it didn't finish after 10 hours on my laptop
so just picked some dismax tasks. Looking at the result, this change basically
has no effect on performance of those dismax tasks including `DismaxTerm`. It
also means that the bulk scoring is not improving performance of `DismaxTerm`.
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
DismaxOrHighMed 218.28 (15.2%) 216.35
(14.6%) -0.9% ( -26% - 34%) 0.767
DismaxTerm 633.42 (16.9%) 635.44
(16.2%) 0.3% ( -28% - 40%) 0.923
PKLookup 230.86 (14.8%) 232.32
(14.0%) 0.6% ( -24% - 34%) 0.826
DismaxOrHighHigh 202.17 (15.9%) 205.03
(15.3%) 1.4% ( -25% - 38%) 0.650
FilteredDismaxOrHighMed 140.01 (11.7%) 142.35
(10.1%) 1.7% ( -18% - 26%) 0.445
FilteredDismaxOrHighHigh 59.48 (12.1%) 61.13
(8.9%) 2.8% ( -16% - 27%) 0.191
FilteredDismaxTerm 127.87 (13.5%) 133.05
(8.8%) 4.1% ( -16% - 30%) 0.075
```
<details>
<summary>Commands to run bench and task detail</summary>
```
util % grep -A 5 'sourceData =' src/python/localrun.py
sourceData = competition.Data(
"wikimediumall",
constants.WIKI_MEDIUM_DOCS_LINE_FILE,
constants.WIKI_MEDIUM_DOCS_COUNT,
constants.DISMAX_TERM_TASKS_FILE,
)
util % cat src/python/localconstants.py
BASE_DIR =
'/Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home'
BENCH_BASE_DIR =
'/Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home/util'
DISMAX_TERM_TASKS_FILE = '%s/tasks/dismax_term_only.tasks' % BENCH_BASE_DIR
util % cd ../lucene_baseline && git show -s --oneline HEAD && cd ../util/
7ebdb9316e5 (HEAD -> main, origin/main, origin/HEAD) Add next minor version
10.5.0
util % cd ../lucene_candidate && git show -s --oneline HEAD && cd ../util
68ada56464 (HEAD -> dismax-bulk-heuristic, origin/dismax-bulk-heuristic)
./gradlew tidy --rerun-tasks
util % python src/python/localrun.py --iterations=50 --warmups=50 -b
/Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home/lucene_baseline
-c
/Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home/lucene_candidate
> result.txt
util % cat tasks/dismax_term_only.tasks
DismaxTerm: 0 +dismaxFields=titleTokenized,body
DismaxTerm: names +dismaxFields=titleTokenized,body
DismaxTerm: nbsp +dismaxFields=titleTokenized,body
DismaxTerm: part +dismaxFields=titleTokenized,body
DismaxTerm: st +dismaxFields=titleTokenized,body
DismaxOrHighHigh: are last +dismaxFields=titleTokenized,body
DismaxOrHighHigh: at united +dismaxFields=titleTokenized,body
DismaxOrHighHigh: but year +dismaxFields=titleTokenized,body
DismaxOrHighHigh: name its +dismaxFields=titleTokenized,body
DismaxOrHighHigh: to but +dismaxFields=titleTokenized,body
DismaxOrHighMed: at mostly +dismaxFields=titleTokenized,body
DismaxOrHighMed: his interview +dismaxFields=titleTokenized,body
DismaxOrHighMed: http 9 +dismaxFields=titleTokenized,body
DismaxOrHighMed: they hard +dismaxFields=titleTokenized,body
DismaxOrHighMed: title bay +dismaxFields=titleTokenized,body
FilteredDismaxTerm: 0 +dismaxFields=titleTokenized,body +filter=5%
FilteredDismaxTerm: names +dismaxFields=titleTokenized,body +filter=5%
FilteredDismaxTerm: nbsp +dismaxFields=titleTokenized,body +filter=5%
FilteredDismaxTerm: part +dismaxFields=titleTokenized,body +filter=5%
FilteredDismaxTerm: st +dismaxFields=titleTokenized,body +filter=5%
FilteredDismaxOrHighHigh: are last +dismaxFields=titleTokenized,body
+filter=5%
FilteredDismaxOrHighHigh: at united +dismaxFields=titleTokenized,body
+filter=5%
FilteredDismaxOrHighHigh: but year +dismaxFields=titleTokenized,body
+filter=5%
FilteredDismaxOrHighHigh: name its +dismaxFields=titleTokenized,body
+filter=5%
FilteredDismaxOrHighHigh: to but +dismaxFields=titleTokenized,body +filter=5%
FilteredDismaxOrHighMed: at mostly +dismaxFields=titleTokenized,body
+filter=5%
FilteredDismaxOrHighMed: his interview +dismaxFields=titleTokenized,body
+filter=5%
FilteredDismaxOrHighMed: http 9 +dismaxFields=titleTokenized,body +filter=5%
FilteredDismaxOrHighMed: they hard +dismaxFields=titleTokenized,body
+filter=5%
FilteredDismaxOrHighMed: title bay +dismaxFields=titleTokenized,body
+filter=5%
util %
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]