[ 
https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128349#comment-17128349
 ] 

Danil Lipovoy edited comment on HBASE-23887 at 6/8/20, 2:38 PM:
----------------------------------------------------------------

Is it ok for the summury doc?

—

Sometimes we are reading more data than can fit into BlockCache and it is the 
cause a high rate of evictions.
 This in turn leads to heavy Garbage Collector works. So a lot of blocks put 
into BlockCache but never read, but spending a lot of CPU resources for 
cleaning.

!BlockCacheEvictionProcess.gif!

We could avoid this sitiuation via parameters:

*hbase.lru.cache.heavy.eviction.count.limit*  - set how many times have to run 
eviction process that avoid of putting data to BlockCache. By default it is 
2147483647 and actually equals to disable feature of increasing performance. 
Because eviction runs about every 5 - 10 second (it depends of workload) and 
2147483647 * 10 / 60 / 60 / 24 / 365 = 680 years.  Just after that time it will 
start to work. We can set this parameter to 0 and get working the feature right 
now. 

But if we have some times short reading the same data and some times long-term 
reading - we can divide it by this parameter. 
 For example we know that our short reading used to about 1 minutes, than we 
have to set the parameter about 10 and it will enable the feature only for long 
time massive reading (after ~100 seconds). So when we use short-reading and 
wanted all of them it the cache we will have it (except of evicted of course). 
When we use long-term heavy reading the featue will enabled after some time and 
birng better performance.

 

*hbase.lru.cache.heavy.eviction.mb.size.limit* - set how many bytes desirable 
putting into BlockCache (and evicted from it). The feature will try to reach 
this value and maintan it. Don't try to set it too small because it lead to 
premature exit from this mode. For powerful CPU (about 20-40 physical cores)  
it could be about 400-500 MB. Average system (~10 cores) 200-300 MB. Some weak 
system (2-5 cores) maybe good with 50-100 MB.

How it works: we set the limit and after each ~10 second caluclate how many 
bytes were freed.

Overhead = Freed Bytes Sum (MB) * 100 / Limit (MB) - 100;

For example we set the limit = 500 and were evicted 2000 MB. Overhead is:

2000 * 100 / 500 - 100 = 300%

The feature is going to reduce a percent caching data blocks and fit evicted 
bytes closer to 100% (500 MB). So kind of an auto-scaling.

If freed bytes less then the limit we have got negative overhead, for example 
if were freed 200 MB:

200 * 100 / 500 - 100 = -60% 

The feature will increase the percent of caching blocks and fit evicted bytes 
closer to 100% (500 MB). 

The current situation we can found in the log of RegionServer:

BlockCache evicted (MB): 0, overhead (%): -100, heavy eviction counter: 0, 
current caching DataBlock (%): 100 < no eviction, 100% blocks is caching
 BlockCache evicted (MB): 2000, overhead (%): 300, heavy eviction counter: 1, 
current caching DataBlock (%): 97 < eviction begin, reduce of caching blocks

It help to tune your system and find out what value is better set. Don't try to 
reach 0%  overhead, it is impossible. Quite good 30-100% overhead, it prevent 
premature exit from this mode.

 

*hbase.lru.cache.heavy.eviction.overhead.coefficient* - set how fast we want to 
get the result. If we know that our heavy reading for a long time, we don't 
want to wait and can increase the coefficient and get good performance sooner. 
But if we don't sure we can do it slowly and it could prevent premature exit 
from this mode. So, when the coefficient is higher we can get better 
performance when heavy reading is stable. But when reading is changing we can 
adjust to it and set  the coefficient to lower value.

For example, we set the coefficient = 0.01. It means the overhead (see above) 
will be multiplied by 0.01 and the result is value of reducing percent caching 
blocks. For example, if the overhead = 300% and the coefficient = 0.01, than 
percent of chaching blocks will reduce by 3%.

Similar logic when overhead has got negative value (overshooting).  Mayby it is 
just short-term fluctuation and we will ty to stay in this mode. It help avoid 
permature exit during short-term fluctuation. Backpressure has simple logic: 
more overshooting - more caching blocks.

!image-2020-06-08-17-38-52-579.png!

 

Finally, how to work reducing percent of caching blocks. Imagine we have very 
little cache, where can fit only 1 block and we are trying to read 3 blocks 
with offsets:
 124
 198
 223

Without the feature, or when *hbase.lru.cache.heavy.eviction.count.limit* = 
2147483647 we will put the block:

124, then put 198, evict 124, put 223, evict 198

A lot of work (5 actions and 2 evictions).

 

With the feature and hbase.lru.cache.heavy.eviction.count.limit = 0 and the 
auto-scaling have reached to skip 97% of caching blocks (see the part of log 
above). The last few digits evenly distributed from 0 to 99. When we divide by 
modulus we got:

124 -> 24
 198 -> 98 < this block will not put into BlockCache, because 98 above 97%
 223 -> 23

It means we will not try to handle the block 198 and save CPU for other job. In 
the result - we put block 124, then put 223, evict 124 (just 3 actions and 1 
eviction).

As results for some systems it can increase performance up to 3 times:

!image-2020-06-08-17-38-45-159.png!

-----------

[~vjasani], 
 >>Btw have you got contributor access? Jira is still not assigned to you.

Not yet, it would be good to get it)


was (Author: pustota):
Is it ok for the summury doc?

---

Sometimes we are reading more data than can fit into BlockCache and it is the 
cause a high rate of evictions.
This in turn leads to heavy Garbage Collector works. So a lot of blocks put 
into BlockCache but never read, but spending a lot of CPU resources for 
cleaning.

!BlockCacheEvictionProcess.gif!

We could avoid this sitiuation via parameters:

*hbase.lru.cache.heavy.eviction.count.limit*  - set how many times have to run 
eviction process that avoid of putting data to BlockCache. By default it is 
2147483647 and actually equals to disable feature of increasing performance. 
Because eviction runs about every 5 - 10 second (it depends of workload) and 
2147483647 * 10 / 60 / 60 / 24 / 365 = 680 years.  Just after that time it will 
start to work. We can set this parameter to 0 and get working the feature right 
now. 

But if we have some times short reading the same data and some times long-term 
reading - we can divide it by this parameter. 
For example we know that our short reading used to about 1 minutes, than we 
have to set the parameter about 10 and it will enable the feature only for long 
time massive reading (after ~100 seconds). So when we use short-reading and 
wanted all of them it the cache we will have it (except of evicted of course). 
When we use long-term heavy reading the featue will enabled after some time and 
birng better performance.

 

*hbase.lru.cache.heavy.eviction.mb.size.limit* - set how many bytes desirable 
putting into BlockCache (and evicted from it). The feature will try to reach 
this value and maintan it. Don't try to set it too small because it lead to 
premature exit from this mode. For powerful CPU (about 20-40 physical cores)  
it could be about 400-500 MB. Average system (~10 cores) 200-300 MB. Some weak 
system (2-5 cores) maybe good with 50-100 MB.

How it works: we set the limit and after each ~10 second caluclate how many 
bytes were freed.

Overhead = Freed Bytes Sum (MB) * 100 / Limit (MB) - 100;

For example we set the limit = 500 and were evicted 2000 MB. Overhead is:

2000 * 100 / 500 - 100 = 300%

The feature is going to reduce a percent caching data blocks and fit evicted 
bytes closer to 100% (500 MB). So kind of an auto-scaling.

If freed bytes less then the limit we have got negative overhead, for example 
if were freed 200 MB:

200 * 100 / 500 - 100 = -60% 

The feature will increase the percent of caching blocks and fit evicted bytes 
closer to 100% (500 MB). 

The current situation we can found in the log of RegionServer:

BlockCache evicted (MB): 0, overhead (%): -100, heavy eviction counter: 0, 
current caching DataBlock (%): 100 < no eviction, 100% blocks is caching
BlockCache evicted (MB): 2000, overhead (%): 300, heavy eviction counter: 1, 
current caching DataBlock (%): 97 < eviction begin, reduce of caching blocks

It help to tune your system and find out what value is better set. Don't try to 
reach 0%  overhead, it is impossible. Quite good 30-100% overhead, it prevent 
premature exit from this mode.

 

*hbase.lru.cache.heavy.eviction.overhead.coefficient* - set how fast we want to 
get the result. If we know that our heavy reading for a long time, we don't 
want to wait and can increase the coefficient and get good performance sooner. 
But if we don't sure we can do it slowly and it could prevent premature exit 
from this mode. So, when the coefficient is higher we can get better 
performance when heavy reading is stable. But when reading is changing we can 
adjust to it and set  the coefficient to lower value.

For example, we set the coefficient = 0.01. It means the overhead (see above) 
will be multiplied by 0.01 and the result is value of reducing percent caching 
blocks. For example, if the overhead = 300% and the coefficient = 0.01, than 
percent of chaching blocks will reduce by 3%.

Similar logic when overhead has got negative value (overshooting).  Mayby it is 
just short-term fluctuation and we will ty to stay in this mode. It help avoid 
permature exit during short-term fluctuation. Backpressure has simple logic: 
more overshooting - more caching blocks.

!schema.png!

 

 

Finally, how to work reducing percent of caching blocks. Imagine we have very 
little cache, where can fit only 1 block and we are trying to read 3 blocks 
with offsets:
124
198
223

Without the feature, or when *hbase.lru.cache.heavy.eviction.count.limit* = 
2147483647 we will put the block:

124, then put 198, evict 124, put 223, evict 198

A lot of work (5 actions and 2 evictions).

 

With the feature and hbase.lru.cache.heavy.eviction.count.limit = 0 and the 
auto-scaling have reached to skip 97% of caching blocks (see the part of log 
above). The last few digits evenly distributed from 0 to 99. When we divide by 
modulus we got:

124 -> 24
198 -> 98 < this block will not put into BlockCache, because 98 above 97%
223 -> 23

It means we will not try to handle the block 198 and save CPU for other job. In 
the result - we put block 124, then put 223, evict 124 (just 3 actions and 1 
eviction).

As results for some systems it can increase performance up to 3 times:

!feature.png!

-----------

[~vjasani], 
>>Btw have you got contributor access? Jira is still not assigned to you.

Not yet, it would be good to get it)

> BlockCache performance improve by reduce eviction rate
> ------------------------------------------------------
>
>                 Key: HBASE-23887
>                 URL: https://issues.apache.org/jira/browse/HBASE-23887
>             Project: HBase
>          Issue Type: Improvement
>          Components: BlockCache, Performance
>            Reporter: Danil Lipovoy
>            Priority: Minor
>         Attachments: 1582787018434_rs_metrics.jpg, 
> 1582801838065_rs_metrics_new.png, BC_LongRun.png, 
> BlockCacheEvictionProcess.gif, cmp.png, evict_BC100_vs_BC23.png, 
> eviction_100p.png, eviction_100p.png, eviction_100p.png, gc_100p.png, 
> graph.png, image-2020-06-07-08-11-11-929.png, 
> image-2020-06-07-08-19-00-922.png, image-2020-06-07-12-07-24-903.png, 
> image-2020-06-07-12-07-30-307.png, image-2020-06-08-17-38-45-159.png, 
> image-2020-06-08-17-38-52-579.png, read_requests_100pBC_vs_23pBC.png, 
> requests_100p.png, requests_100p.png, requests_new2_100p.png, 
> requests_new_100p.png, scan.png, wave.png
>
>
> Hi!
> I first time here, correct me please if something wrong.
> I want propose how to improve performance when data in HFiles much more than 
> BlockChache (usual story in BigData). The idea - caching only part of DATA 
> blocks. It is good becouse LruBlockCache starts to work and save huge amount 
> of GC. 
> Sometimes we have more data than can fit into BlockCache and it is cause a 
> high rate of evictions. In this case we can skip cache a block N and insted 
> cache the N+1th block. Anyway we would evict N block quite soon and that why 
> that skipping good for performance.
> Example:
> Imagine we have little cache, just can fit only 1 block and we are trying to 
> read 3 blocks with offsets:
> 124
> 198
> 223
> Current way - we put the block 124, then put 198, evict 124, put 223, evict 
> 198. A lot of work (5 actions).
> With the feature - last few digits evenly distributed from 0 to 99. When we 
> divide by modulus we got:
> 124 -> 24
> 198 -> 98
> 223 -> 23
> It helps to sort them. Some part, for example below 50 (if we set 
> *hbase.lru.cache.data.block.percent* = 50) go into the cache. And skip 
> others. It means we will not try to handle the block 198 and save CPU for 
> other job. In the result - we put block 124, then put 223, evict 124 (3 
> actions). 
> See the picture in attachment with test below. Requests per second is higher, 
> GC is lower.
>  
> The key point of the code:
> Added the parameter: *hbase.lru.cache.data.block.percent* which by default = 
> 100
>  
> But if we set it 1-99, then will work the next logic:
>  
>  
> {code:java}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean 
> inMemory) {   
>   if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())      
>     if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) 
>       return;    
> ... 
> // the same code as usual
> }
> {code}
>  
> Other parameters help to control when this logic will be enabled. It means it 
> will work only while heavy reading going on.
> hbase.lru.cache.heavy.eviction.count.limit - set how many times have to run 
> eviction process that start to avoid of putting data to BlockCache
> hbase.lru.cache.heavy.eviction.bytes.size.limit - set how many bytes have to 
> evicted each time that start to avoid of putting data to BlockCache
> By default: if 10 times (100 secunds) evicted more than 10 MB (each time) 
> then we start to skip 50% of data blocks.
> When heavy evitions process end then new logic off and will put into 
> BlockCache all blocks again.
>  
> Descriptions of the test:
> 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 4 RegionServers
> 4 tables by 64 regions by 1.88 Gb data in each = 600 Gb total (only FAST_DIFF)
> Total BlockCache Size = 48 Gb (8 % of data in HFiles)
> Random read in 20 threads
>  
> I am going to make Pull Request, hope it is right way to make some 
> contribution in this cool product.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to