Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Internally Guava uses a `ConcurrentLinkedQueue` and an `AtomicInteger` to record its size, per segment. When a read occurs, it records that in the queue and then drains it under the segment's lock (via tryLock) to replay the events. This is similar to Caffeine, which uses optimized structures instead. I intended the CLQ & counter as baseline scaffolding for replacement, as it is an obvious bottleneck, but I could never get it replaced despite advocating for it. The penalty of draining the buffers is amortized, but unfortunately this buffer isn't capped. Since there would be a higher hit rate with a larger cache, the reads would be recorded more often. Perhaps contention there and the penalty of draining the queue is more observable than a cache miss. That's still surprising since a cache miss is usually more expensive I/O. Is the loader doing expensive work in your case? Caffeine gets around this problem by using more optimal buffers and being lossy (on reads only) if it can't keep up. By default it delegates the amortized maintenance work to a ForkJoinPool to avoid user-facing latencies, since you'll want those variances to be tight. Much of that can be back ported onto Guava for a nice boost.