I don't think you need an index that is so large that the terms dictionary
doesn't fit in the OS cache to reproduce the difference, but you might need
a larger index indeed. On my end I use wikimedium10M or wikimediumall (and
wikibigall if I need to test phrases) most of the time as I get more noise
with smaller indices. I added an annotation, it should be caught up next
time benchmarks run.

I also pushed a change to take into account the fact that the default codec
changed. However, I did not add backward-codecs.jar to the classpath, you
should rebuild the index that you use for benchmarking so that it uses the
Lucene80 codec instead of Lucene70.

Le ven. 24 août 2018 à 02:03, Michael Sokolov <msoko...@gmail.com> a écrit :

> I think the benchmarks need updating after LUCENE-8461. I got them working
> again by replacing lucene70 with lucene80 everywhere except for the
> DocValues formats, and adding the backward-codecs.jar to the benchmarks
> build. I'm not sure that was really the right way to go about this? After
> that I did try switching to use FST50 for this PKLookup benchmark (see
> below), but it did not recover the lost perf.
>
> diff --git a/src/python/nightlyBench.py b/src/python/nightlyBench.py
> index b42fe84..5807e49 100644
> --- a/src/python/nightlyBench.py
> +++ b/src/python/nightlyBench.py
> @@ -699,7 +699,7 @@ def run():
> -                                  idFieldPostingsFormat='Lucene50',
> +                                  idFieldPostingsFormat='FST50',
>
>
> On Thu, Aug 23, 2018 at 5:52 PM Michael Sokolov <msoko...@gmail.com>
> wrote:
>
>> OK thanks. I guess this benchmark must be run on a large-enough index
>> that it doesn't fit entirely in RAM already anyway? When I ran it locally
>> using the vanilla benchmark instructions, I believe the generated index was
>> quite small (wikimedium10k).  At any rate, I don't have any specific use
>> case yet, just thinking about some possibilities related to primary key
>> lookup and came across this anomaly. Perhaps at least it deserves an
>> annotation on the benchmark graph.
>>
>

Reply via email to