Bryan Beaudreault created HBASE-27587:
-----------------------------------------

             Summary: L1 cache leaks index blocks over time when under 
subscribed
                 Key: HBASE-27587
                 URL: https://issues.apache.org/jira/browse/HBASE-27587
             Project: HBase
          Issue Type: Bug
            Reporter: Bryan Beaudreault


Let's say you have CombinedBlockCache enabled. DATA goes to L2, INDEX/BLOOM go 
to L1.  Your regionserver has index size of 2gb and bloom size of 1gb. So you 
really only need around 3gb of L1 to fully hold all of the "L1 candidates".

When data set does not fit into cache, LRU will handle evictions to stay under 
max. But in the above scenario, if you configure 6gb for L1 (3 more than 
needed) over time you will end up filling that entire 6gb with old INDEX 
blocks. Once you reach max, LRU will handle evicting out the oldest ones.

Since the leak is contained to the configured max L1 size, this isn't a huge 
issue but it results in heap waste. Under high heap allocations, if you haven't 
left enough buffer outside memstore, L1, etc, you will start seeing GC 
pressure. This L1 leak then becomes a little more problematic, because you end 
up in a circumstance where longer lived regionservers (who've leaked closer to 
the max L1 size) have less extra buffer available than more newly restarted 
regionservers.

The best fix is to appropriately set your L1 size so there is not a lot of 
excess, but this can be painful to maintain over time as clusters shrink, grow, 
or data shape changes. It'd be a lot better if the L1 did not leak so you don't 
have to so finely tune the L1.

I haven't fully figured out where the leak comes from, but I think it's related 
to compactions. Perhaps the INDEX blocks are not being evicted as hfiles are 
compacted away. The leak is very linear over time in our experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to