[ 
https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113632#comment-17113632
 ] 

Bharath Vissapragada commented on HBASE-23887:
----------------------------------------------

Thanks for your detailed explanation, benchmarks and visualizations. I think I 
see your intuition behind the idea of dynamically adapting the cache to control 
thrashing at higher eviction rate. I have a few questions though, curious to 
know your thoughts.

1. Your CM charts show that the block cache miss rate (first chart on the 
bottom) increased almost 2x. While that is expected, since you are not 
aggressively caching blocks, It is a bit concerning, I think. All the read 
distributions show that the 95th and 99th %ile latencies are 3x bad generally 
except for UNIFORM distribution. That is inline with your intuition too I guess 
since your algorithm works well if the distribution of offsets is uniform. Even 
with these drop in 95%ile and 99%ile latencies, average latency is still higher 
without the patch (take LATEST for example), meaning there are a few outliers 
with LRU and thats affecting the throughput and average latency? Did you get a 
chance to dig into the results, how would you interpret them?

2. What would be pathological workload for your design? Do you foresee any 
specific workload pattern that might be working very well with LRU but 
regresses pretty bad without your patch?

3. Let's say you have mix of 90% scan and 10% non-scan workload. Assuming the 
non-scan workload benefits from caching, wouldn't the performance for the 
non-scan workload be non-deterministic depending on how the HFiles are laid out 
(say before and after a compaction). If we are lucky and the block offsets for 
the non-scan workload falls in the ideal range, we are good, otherwise they 
aren't even considered for caching. Does your patch handle this some how or did 
I miss it?

4. Can you please expand on how one chooses the values to configure for your 
newly added params? Are there any rough guidelines?

5. In the following piece of code, When you consider 'bytesFreed', you don't 
check whether the freed memory was from single access cache block or a multi 
access cache block (both of them have equal weights). Should we consider 
different weights for single and multi access cache blocks?

{noformat}
  bytesFreed = cache.evict();
  // If heavy cleaning BlockCache control.
  // It helps avoid put too many blocks into BlockCache
  // when evict() works very active.
  if (bytesFreed > 0 && bytesFreed > cache.heavyEvictionBytesSizeLimit) {
     cache.heavyEvictionCount++;
  }
  else {
     cache.heavyEvictionCount = 0;
  }
{noformat}

My point being, since this a segmented LRU with separate chunks of memory for 
single and multi access blocks, constant eviction of single access cache blocks 
probably signifies that there is scan based workload in progress (see the 
following snippet of code). However constant eviction of multi blocks means 
that there are cache hits and your optimization shouldn't kick in. Thoughts? 
(Would this help get the cache miss count down?)

{noformat}
          // this means no need to evict block in memory bucket,
          // and we try best to make the ratio between single-bucket and
          // multi-bucket is 1:2
          long bytesRemain = s + m - bytesToFree;
          if (3 * s <= bytesRemain) {
            // single-bucket is small enough that no eviction happens for it
            // hence all eviction goes from multi-bucket
            bytesFreed = bucketMulti.free(bytesToFree);
          } else if (3 * m <= 2 * bytesRemain) {
            // multi-bucket is small enough that no eviction happens for it
            // hence all eviction goes from single-bucket
            bytesFreed = bucketSingle.free(bytesToFree);
          } else {
            // both buckets need to evict some blocks
            bytesFreed = bucketSingle.free(s - bytesRemain / 3);
            if (bytesFreed < bytesToFree) {
              bytesFreed += bucketMulti.free(bytesToFree - bytesFreed);
            }
          }
{noformat}


> BlockCache performance improve by reduce eviction rate
> ------------------------------------------------------
>
>                 Key: HBASE-23887
>                 URL: https://issues.apache.org/jira/browse/HBASE-23887
>             Project: HBase
>          Issue Type: Improvement
>          Components: BlockCache, Performance
>            Reporter: Danil Lipovoy
>            Priority: Minor
>         Attachments: 1582787018434_rs_metrics.jpg, 
> 1582801838065_rs_metrics_new.png, BC_LongRun.png, 
> BlockCacheEvictionProcess.gif, cmp.png, evict_BC100_vs_BC23.png, 
> read_requests_100pBC_vs_23pBC.png
>
>
> Hi!
> I first time here, correct me please if something wrong.
> I want propose how to improve performance when data in HFiles much more than 
> BlockChache (usual story in BigData). The idea - caching only part of DATA 
> blocks. It is good becouse LruBlockCache starts to work and save huge amount 
> of GC. 
> Sometimes we have more data than can fit into BlockCache and it is cause a 
> high rate of evictions. In this case we can skip cache a block N and insted 
> cache the N+1th block. Anyway we would evict N block quite soon and that why 
> that skipping good for performance.
> Example:
> Imagine we have little cache, just can fit only 1 block and we are trying to 
> read 3 blocks with offsets:
> 124
> 198
> 223
> Current way - we put the block 124, then put 198, evict 124, put 223, evict 
> 198. A lot of work (5 actions).
> With the feature - last few digits evenly distributed from 0 to 99. When we 
> divide by modulus we got:
> 124 -> 24
> 198 -> 98
> 223 -> 23
> It helps to sort them. Some part, for example below 50 (if we set 
> *hbase.lru.cache.data.block.percent* = 50) go into the cache. And skip 
> others. It means we will not try to handle the block 198 and save CPU for 
> other job. In the result - we put block 124, then put 223, evict 124 (3 
> actions). 
> See the picture in attachment with test below. Requests per second is higher, 
> GC is lower.
>  
> The key point of the code:
> Added the parameter: *hbase.lru.cache.data.block.percent* which by default = 
> 100
>  
> But if we set it 1-99, then will work the next logic:
>  
>  
> {code:java}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean 
> inMemory) {   
>   if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())      
>     if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) 
>       return;    
> ... 
> // the same code as usual
> }
> {code}
>  
> Other parameters help to control when this logic will be enabled. It means it 
> will work only while heavy reading going on.
> hbase.lru.cache.heavy.eviction.count.limit - set how many times have to run 
> eviction process that start to avoid of putting data to BlockCache
> hbase.lru.cache.heavy.eviction.bytes.size.limit - set how many bytes have to 
> evicted each time that start to avoid of putting data to BlockCache
> By default: if 10 times (100 secunds) evicted more than 10 MB (each time) 
> then we start to skip 50% of data blocks.
> When heavy evitions process end then new logic off and will put into 
> BlockCache all blocks again.
>  
> Descriptions of the test:
> 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 4 RegionServers
> 4 tables by 64 regions by 1.88 Gb data in each = 600 Gb total (only FAST_DIFF)
> Total BlockCache Size = 48 Gb (8 % of data in HFiles)
> Random read in 20 threads
>  
> I am going to make Pull Request, hope it is right way to make some 
> contribution in this cool product.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to