pustota2009 commented on a change in pull request #1200: HBASE 23887 BlockCache
performance improve
URL: https://github.com/apache/hbase/pull/1200#discussion_r383691217
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
##########
@@ -400,16 +413,24 @@ private Cacheable asReferencedHeapBlock(Cacheable buf) {
*/
@Override
public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean
inMemory) {
+ if (cacheDataBlockPercent != 100 && buf.getBlockType().isData()) {
+ // Don't cache this DATA block if we have limit on BlockCache,
+ // good for performance (HBASE-23887)
+ if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) {
Review comment:
Yes, I think you got the general idea - when data much more then the cache
we have overhead. Lets provide some details. Why we do a lot of job:
1. Put some key-value into the map
2. Delete it quite soon, because there are quite big queue at entrance.
3. Clean the garbage
But anyway probability to hit into cache definite by size of cache. That is
why we can put into cache some random blocks and skip 1-3 steps for the rest
blocks. How it works:
Imagine we have little cache. Just for 1 block and trying to read 3 blocks
with offsets (into files):
124
198
223
Original way - we put the block 124, then put 198, evict 124, put 223, evict
198. A lot of work (5 actions).
With the feature - last few digits evenly distributed from 0 to 99. When we
divide by modulus we got:
124 -> 24
198 -> 98
223 -> 23
It helps to sort them. Some part, for example below 50 (if we set
cacheDataBlockPercent = 50) go into the cache. And skip others. It means we
will not try work with the block 198 and save CPU for other job. In the result
- we put block 124, then put 223, evict 124 (3 actions). Thats the idea.
If all is ok, later we can do some little improvments:
1. Don't even try get from the cache DATA blocks which can't be there (198
in the case above).
2. Populate the cache by data blocks anyway when the cache is empty.
And looks as good improvment provide the same logic for L2.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services