Bryan Beaudreault created HBASE-27587:
-----------------------------------------
Summary: L1 cache leaks index blocks over time when under
subscribed
Key: HBASE-27587
URL: https://issues.apache.org/jira/browse/HBASE-27587
Project: HBase
Issue Type: Bug
Reporter: Bryan Beaudreault
Let's say you have CombinedBlockCache enabled. DATA goes to L2, INDEX/BLOOM go
to L1. Your regionserver has index size of 2gb and bloom size of 1gb. So you
really only need around 3gb of L1 to fully hold all of the "L1 candidates".
When data set does not fit into cache, LRU will handle evictions to stay under
max. But in the above scenario, if you configure 6gb for L1 (3 more than
needed) over time you will end up filling that entire 6gb with old INDEX
blocks. Once you reach max, LRU will handle evicting out the oldest ones.
Since the leak is contained to the configured max L1 size, this isn't a huge
issue but it results in heap waste. Under high heap allocations, if you haven't
left enough buffer outside memstore, L1, etc, you will start seeing GC
pressure. This L1 leak then becomes a little more problematic, because you end
up in a circumstance where longer lived regionservers (who've leaked closer to
the max L1 size) have less extra buffer available than more newly restarted
regionservers.
The best fix is to appropriately set your L1 size so there is not a lot of
excess, but this can be painful to maintain over time as clusters shrink, grow,
or data shape changes. It'd be a lot better if the L1 did not leak so you don't
have to so finely tune the L1.
I haven't fully figured out where the leak comes from, but I think it's related
to compactions. Perhaps the INDEX blocks are not being evicted as hfiles are
compacted away. The leak is very linear over time in our experience.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)