pustota2009 commented on a change in pull request #1200: HBASE 23887 BlockCache 
performance improve
URL: https://github.com/apache/hbase/pull/1200#discussion_r383691217
 
 

 ##########
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
 ##########
 @@ -400,16 +413,24 @@ private Cacheable asReferencedHeapBlock(Cacheable buf) {
    */
   @Override
   public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean 
inMemory) {
+    if (cacheDataBlockPercent != 100 && buf.getBlockType().isData()) {
+      // Don't cache this DATA block if we have limit on BlockCache,
+      // good for performance (HBASE-23887)
+      if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) {
 
 Review comment:
   Yes, I think you got the general idea - when data much more then the cache 
we have overhead. Lets provide some details. Why we do a lot of job:
   1. Put some key-value into the map
   2. Delete it quite soon, because there are quite big queue at entrance.
   3. Clean the garbage
   
   But any way probability to hit into cache definite size of cache. That is 
why we can put into cache some random blocks and skip 1-3 steps for the rest 
blocks. How it works:
   
   Imagine we have little cache. Just for 1 block and trying to read 3 blocks 
with offsets (into files):
   124
   198
   223
   
   
   Original old way - we put the block 124, then put 198, evict 124, put 223, 
evict 198. A lot of work (5 actions).
   
   With the patch - last few digits evenly distributed from 0 to 99. When we 
divide by modulus we got:
   124 -> 24
   198 -> 98
   223 -> 23
   
   It helps to sort them. Some part,  for example below 50 (if we set 
cacheDataBlockPercent = 50) go into the cache. And skip others. It means we 
will not try work with the block 198 and save CPU for other job. In the result 
- we put block 124, then put 223, evict 124 (3 actions). Thats the idea. 
   
   If all is ok, later we can do some little improvments:
   1. Don't even try get from the cache DATA blocks which can't be there (118 
in the case above). 
   2. Populate the cache data blocks any way when the cache is empty. 
   
   And looks as good improvment provide the same logic for L2. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to