I happened to stumble across this chart
https://home.apache.org/~mikemccand/lucenebench/PKLookup.html showing a
pretty drastic drop in this benchmark on 5/13. I looked at the commits
between the previous run and this one and did some investigation, trying to
do some git bisect to find the problem using benchmarks as a test, but it
proved to be quite difficult due to a breaking change re: MemoryCodec that
also required corresponding changes in  benchmark code.

In the end, I think removing MemoryCodec is what caused the drop in perf
here, based on this comment in benchmark code:

'2011-06-26'
   Switched to MemoryCodec for the primary-key 'id' field so that lookups
(either for PKLookup test or for deletions during reopen in the NRT test)
are fast, with no IO.  Also switched to NRTCachingDirectory for the NRT
test, so that small new segments are written only in RAM.

I don't really understand the implications here beyond benchmarks, but it
does seem that perhaps some essential high-performing capability has been
lost?  Is there some equivalent thing remaining after MemoryCodec's removal
that can be used for primary keys?

-Mike

Reply via email to