I don't think you need an index that is so large that the terms dictionary doesn't fit in the OS cache to reproduce the difference, but you might need a larger index indeed. On my end I use wikimedium10M or wikimediumall (and wikibigall if I need to test phrases) most of the time as I get more noise with smaller indices. I added an annotation, it should be caught up next time benchmarks run.
I also pushed a change to take into account the fact that the default codec changed. However, I did not add backward-codecs.jar to the classpath, you should rebuild the index that you use for benchmarking so that it uses the Lucene80 codec instead of Lucene70. Le ven. 24 août 2018 à 02:03, Michael Sokolov <msoko...@gmail.com> a écrit : > I think the benchmarks need updating after LUCENE-8461. I got them working > again by replacing lucene70 with lucene80 everywhere except for the > DocValues formats, and adding the backward-codecs.jar to the benchmarks > build. I'm not sure that was really the right way to go about this? After > that I did try switching to use FST50 for this PKLookup benchmark (see > below), but it did not recover the lost perf. > > diff --git a/src/python/nightlyBench.py b/src/python/nightlyBench.py > index b42fe84..5807e49 100644 > --- a/src/python/nightlyBench.py > +++ b/src/python/nightlyBench.py > @@ -699,7 +699,7 @@ def run(): > - idFieldPostingsFormat='Lucene50', > + idFieldPostingsFormat='FST50', > > > On Thu, Aug 23, 2018 at 5:52 PM Michael Sokolov <msoko...@gmail.com> > wrote: > >> OK thanks. I guess this benchmark must be run on a large-enough index >> that it doesn't fit entirely in RAM already anyway? When I ran it locally >> using the vanilla benchmark instructions, I believe the generated index was >> quite small (wikimedium10k). At any rate, I don't have any specific use >> case yet, just thinking about some possibilities related to primary key >> lookup and came across this anomaly. Perhaps at least it deserves an >> annotation on the benchmark graph. >> >