shimpeko commented on PR #15659:
URL: https://github.com/apache/lucene/pull/15659#issuecomment-3847147158

   I tried to run wikinightly but it didn't finish after 10 hours on my laptop 
so just picked some dismax tasks. Looking at the result, this change basically 
has no effect on performance of those dismax tasks including `DismaxTerm`. It 
also means that the bulk scoring is not improving performance of `DismaxTerm`.
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                    DismaxOrHighMed      218.28     (15.2%)      216.35     
(14.6%)   -0.9% ( -26% -   34%) 0.767
                         DismaxTerm      633.42     (16.9%)      635.44     
(16.2%)    0.3% ( -28% -   40%) 0.923
                           PKLookup      230.86     (14.8%)      232.32     
(14.0%)    0.6% ( -24% -   34%) 0.826
                   DismaxOrHighHigh      202.17     (15.9%)      205.03     
(15.3%)    1.4% ( -25% -   38%) 0.650
            FilteredDismaxOrHighMed      140.01     (11.7%)      142.35     
(10.1%)    1.7% ( -18% -   26%) 0.445
           FilteredDismaxOrHighHigh       59.48     (12.1%)       61.13      
(8.9%)    2.8% ( -16% -   27%) 0.191
                 FilteredDismaxTerm      127.87     (13.5%)      133.05      
(8.8%)    4.1% ( -16% -   30%) 0.075
   ```
   
   
   
   <details>
   <summary>Commands to run bench and task detail</summary>
   
   ```
   util % grep -A 5 'sourceData =' src/python/localrun.py 
     sourceData = competition.Data(
       "wikimediumall",
       constants.WIKI_MEDIUM_DOCS_LINE_FILE,
       constants.WIKI_MEDIUM_DOCS_COUNT,
       constants.DISMAX_TERM_TASKS_FILE,
     )
   util % cat src/python/localconstants.py 
   BASE_DIR = 
'/Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home'
   BENCH_BASE_DIR = 
'/Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home/util'
   DISMAX_TERM_TASKS_FILE = '%s/tasks/dismax_term_only.tasks' % BENCH_BASE_DIR
   util % cd ../lucene_baseline && git show -s --oneline HEAD && cd ../util/
   7ebdb9316e5 (HEAD -> main, origin/main, origin/HEAD) Add next minor version 
10.5.0
   util % cd ../lucene_candidate && git show -s --oneline HEAD && cd ../util 
   68ada56464 (HEAD -> dismax-bulk-heuristic, origin/dismax-bulk-heuristic) 
./gradlew tidy --rerun-tasks
   util % python src/python/localrun.py --iterations=50 --warmups=50 -b 
/Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home/lucene_baseline
 -c 
/Users/shimpei-kodama/github.com/mikemccand/luceneutil/bench_home/lucene_candidate
 > result.txt
   util % cat tasks/dismax_term_only.tasks 
   DismaxTerm: 0 +dismaxFields=titleTokenized,body
   DismaxTerm: names +dismaxFields=titleTokenized,body
   DismaxTerm: nbsp +dismaxFields=titleTokenized,body
   DismaxTerm: part +dismaxFields=titleTokenized,body
   DismaxTerm: st +dismaxFields=titleTokenized,body
   
   DismaxOrHighHigh: are last +dismaxFields=titleTokenized,body
   DismaxOrHighHigh: at united +dismaxFields=titleTokenized,body
   DismaxOrHighHigh: but year +dismaxFields=titleTokenized,body
   DismaxOrHighHigh: name its +dismaxFields=titleTokenized,body
   DismaxOrHighHigh: to but +dismaxFields=titleTokenized,body
   
   DismaxOrHighMed: at mostly +dismaxFields=titleTokenized,body
   DismaxOrHighMed: his interview +dismaxFields=titleTokenized,body
   DismaxOrHighMed: http 9 +dismaxFields=titleTokenized,body
   DismaxOrHighMed: they hard +dismaxFields=titleTokenized,body
   DismaxOrHighMed: title bay +dismaxFields=titleTokenized,body
   
   FilteredDismaxTerm: 0 +dismaxFields=titleTokenized,body +filter=5%
   FilteredDismaxTerm: names +dismaxFields=titleTokenized,body +filter=5%
   FilteredDismaxTerm: nbsp +dismaxFields=titleTokenized,body +filter=5%
   FilteredDismaxTerm: part +dismaxFields=titleTokenized,body +filter=5%
   FilteredDismaxTerm: st +dismaxFields=titleTokenized,body +filter=5%
   
   FilteredDismaxOrHighHigh: are last +dismaxFields=titleTokenized,body 
+filter=5%
   FilteredDismaxOrHighHigh: at united +dismaxFields=titleTokenized,body 
+filter=5%
   FilteredDismaxOrHighHigh: but year +dismaxFields=titleTokenized,body 
+filter=5%
   FilteredDismaxOrHighHigh: name its +dismaxFields=titleTokenized,body 
+filter=5%
   FilteredDismaxOrHighHigh: to but +dismaxFields=titleTokenized,body +filter=5%
   
   FilteredDismaxOrHighMed: at mostly +dismaxFields=titleTokenized,body 
+filter=5%
   FilteredDismaxOrHighMed: his interview +dismaxFields=titleTokenized,body 
+filter=5%
   FilteredDismaxOrHighMed: http 9 +dismaxFields=titleTokenized,body +filter=5%
   FilteredDismaxOrHighMed: they hard +dismaxFields=titleTokenized,body 
+filter=5%
   FilteredDismaxOrHighMed: title bay +dismaxFields=titleTokenized,body 
+filter=5%
   util %
   ```
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to