bbeaudreault commented on code in PR #4410:
URL: https://github.com/apache/hbase/pull/4410#discussion_r927747771


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCacheFactory.java:
##########
@@ -113,7 +117,17 @@ public static BlockCache createBlockCache(Configuration 
conf) {
         LOG.warn(
           "From HBase 2.0 onwards only combined mode of LRU cache and bucket 
cache is available");
       }
-      return bucketCache == null ? l1Cache : new CombinedBlockCache(l1Cache, 
bucketCache);
+
+      if (bucketCache == null) {
+        return l1Cache;
+      }
+
+      if (conf.getBoolean(BLOCKCACHE_VICTIM_HANDLER_ENABLED_KEY,
+          BLOCKCACHE_VICTIM_HANDLER_ENABLED_DEFAULT)) {
+        return new InclusiveCombinedBlockCache(l1Cache, bucketCache);

Review Comment:
   Ok that makes sense. I've been looking at the implementation of the 
`victimHandler` code in LruBlockCache. A cache miss in L1 which is found in L2 
[is promoted back to 
L1](https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java#L512-L517).
 It seems like that would cause your change to not really improve GC, because 
we'd continually have META blocks going back and forth between L1 and L2.
   
   So your results were confusing to me, and I also tested this and my results 
were similar to yours. But then I noticed in branch-2.4 (and 2), we don't call 
[`L1.getBlock` unless 
`L1.containsBlock`](https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.java#L81).
 This circumvents the victimHandler code above, explaining the results.
   
   But here in master branch, an [improvement was 
made](https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.java#L85-L96)
 to remove the unnecessary containsBlock call when we know the BlockType (most 
cases). So I think in master branch (or if that improvement is ever ported to 
branch-2) we'd probably stop seeing the GC improvement.
   
   So I was thinking about this and without the victim promotion back to L1 we 
basically have a case where certain blocks will get into L2 and never get back 
to L1. This artificially benefits META blocks which may be read later in the 
lifetime of the regionserver, because there is less competition for L1 space 
since more blocks will have flushed to L2. This doesn't quite feel like a real 
L1/L2 cache setup, instead it feels like a single cache where some blocks 
randomly exist on-heap and some off-heap. 
   
   So I do agree with you that we need to have a solution for the case where 
META blocks are much larger than the possible L1 size, but the current approach 
feels like an accidental solution rather than a complete one. A few thoughts I 
had:
   
   1. Just disable LRU, use a single unified BucketCache as the L1. Personally 
I don't think the BucketCache performs that much worse than LRU. In my tests 
it's always an improvement over LRU, and @anoopsjohn's reports from years ago 
Alibaba tests show that BucketCache is 30% faster. 
   2. Use LruAdaptiveBlockCache as the L1, which would help reduce the amount 
of churn between the L1 and L2. This algorithm is cool, but comes with many 
complicated configurations.
   3. Enhance the victim handler code to reduce the likelihood of promoting a 
victim back into L1. A good example here might be to inspect the accessCount on 
the victim block and only promote after multiple access.
   
   I'm guessing option 3 might be the most generally acceptable solution, if 
it's possible. I think the simplest form would check the BlockPriority of the 
L2-cached block and only promote when it's BlockPriority.MULTI. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to