iprithv commented on PR #16069:
URL: https://github.com/apache/lucene/pull/16069#issuecomment-4493275247

   > Thanks @iprithv, getting closer :)
   > 
   > Can you run the luceneutil wikibigall benchmarks (from 
https://github.com/mikemccand/luceneutil) and post the results here? That 
should give us an idea of the real-world impact of these changes.
   
   I couldn’t find the wikibigall dataset. the file <code 
inline="">enwiki-20120502-lines-with-random-label.txt</code> doesn’t seem to be 
available anymore. looks like the old URL in <code 
inline="">constants.py</code> is gone, and <code 
inline="">initial_setup.py</code> only downloads wikimedium now (maybe related 
to https://github.com/apache/lucene/issues/13647). I also checked for mirrors 
but didn’t find anything.</p><p>so instead, I ran the wikimediumall benchmark 
(33M docs, same task file, 5 JVM iterations).</p><p>results show no real 
regressions in disjunction queries:</p>
   task | baseline qps | candidate qps | diff | p-value
   -- | -- | -- | -- | --
   OrHighHigh | 87.83 | 86.23 | -1.8% | 0.808
   OrHighMed | 244.14 | 238.30 | -2.4% | 0.556
   OrHighLow | 925.61 | 908.15 | -1.9% | 0.571
   OrNotHighHigh | 505.40 | 507.37 | +0.4% | 0.822
   OrNotHighMed | 363.62 | 366.20 | +0.7% | 0.710
   OrNotHighLow | 704.81 | 716.40 | +1.6% | 0.337
   
   if there’s another place to get the wikibigall dataset, let me know and I 
can rerun with that. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to