shimpeko commented on PR #15659: URL: https://github.com/apache/lucene/pull/15659#issuecomment-3853027797
I added test case that are closed to my query which has dismax + constant_score as https://github.com/shimpeko/luceneutil/pull/1/changes. Looking at the following benchmark result, I think I can say that the changes on this PR has significant positive impact on the performance of specifc type of query. > As far as block-max optimizations are concerned, DisjunctionBulkMaxScorer tracks the min competitive score and passes it to its sub clauses whenever scoring a window This seems true, so I don't cleary understand why using DisjunctionBulkMaxScorer causing regression in this paticular case, yet. @jpountz do you have any idea. Result (50 warmups, 50 iter) ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value DismaxTerm 1034.68 (11.6%) 935.85 (11.7%) -9.6% ( -29% - 15%) 0.000 DismaxOrHighMed 224.64 (16.2%) 206.15 (16.7%) -8.2% ( -35% - 29%) 0.012 FilteredDismaxTerm 201.44 (9.5%) 189.45 (12.1%) -5.9% ( -25% - 17%) 0.006 FilteredDismaxOrHighHigh 61.88 (14.1%) 59.53 (11.3%) -3.8% ( -25% - 25%) 0.137 DismaxOrHighHigh 99.12 (13.0%) 95.37 (12.9%) -3.8% ( -26% - 25%) 0.143 PKLookup 240.78 (14.9%) 232.55 (15.0%) -3.4% ( -29% - 31%) 0.254 FilteredDismaxOrHighMed 170.69 (14.1%) 166.51 (13.5%) -2.5% ( -26% - 29%) 0.374 DisMaxCsTerm1 1814.59 (12.8%) 2010.12 (16.9%) 10.8% ( -16% - 46%) 0.000 DisMaxCSTerm20 169.78 (12.3%) 307.85 (43.2%) 81.3% ( 22% - 156%) 0.000 ``` <details> <summary>Commands to run bench and task detail</summary> ``` util % cd ../lucene_candidate && git show -s --oneline HEAD && cd ../util 68ada56464 (HEAD -> dismax-bulk-heuristic, origin/dismax-bulk-heuristic) ./gradlew tidy --rerun-tasks util % cd ../lucene_baseline && git show -s --oneline HEAD && cd ../util 7ebdb9316e5 (HEAD -> main, origin/main, origin/HEAD) Add next minor version 10.5.0 util % cd ../lucene_candidate && git show -s --oneline HEAD && cd ../util 68ada56464 (HEAD -> dismax-bulk-heuristic, origin/dismax-bulk-heuristic) ./gradlew tidy --rerun-tasks util % grep -A 5 'sourceData =' src/python/localrun.py sourceData = competition.Data( "wikimediumall", constants.WIKI_MEDIUM_DOCS_LINE_FILE, constants.WIKI_MEDIUM_DOCS_COUNT, constants.DISMAX_TASKS_FILE, ) util % cat src/python/localconstants.py import os BASE_DIR = '/Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home' BENCH_BASE_DIR = '/Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home/util' DISMAX_TASKS_FILE = '%s/tasks/dismax_constantscore.tasks' % BENCH_BASE_DIR #JAVA_HOME = os.environ.get("JAVA_HOME") #java_bin = JAVA_HOME + "/bin/" if JAVA_HOME else "" #if java_bin: # print("Using java from: %s" % java_bin) #if "JAVA_EXE" not in globals(): # JAVA_EXE = f"{java_bin}java" #if "JAVAC_EXE" not in globals(): # JAVAC_EXE = f"{java_bin}javac" #if "JAVA_COMMAND" not in globals(): # JAVA_COMMAND = "%s -server -Xms2g -Xmx2g --add-modules jdk.incubator.vector -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParallelGC -Dlucene.dismax.debug=true" % JAVA_EXE #else: # print("use java command %s" % JAVA_COMMAND) # pyright: ignore[reportUndefinedVariable] # TODO: fix how variables are managed here util % python src/python/localrun.py --iteration=50 --warmups=50 -b /Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home/lucene_baseline -c /Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home/lucene_candidate > result.txt ``` </summary> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
