[ 
https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120590#comment-17120590
 ] 

Danil Lipovoy edited comment on HBASE-23887 at 5/31/20, 8:43 PM:
-----------------------------------------------------------------

All tests below have done on my home PC: _AMD Ryzen 7 2700X Eight-Core 
Processor (3150 MHz, 16 threads)._

Logic of auto-scaling (see describe here):

 
{code:java}
public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory) 
{
  if (cacheDataBlockPercent != 100 && buf.getBlockType().isData()) {
    if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) {
      return;
    }
  }
...{code}
And how to calculate cacheDataBlockPercent is here:
{code:java}
public void run() {
...
LruBlockCache cache = this.cache.get();
if (cache == null) break;
bytesFreed = cache.evict();
long stopTime = System.currentTimeMillis(); // We need of control the time of 
working cache.evict()
// If heavy cleaning BlockCache control.
// It helps avoid put too many blocks into BlockCache
// when evict() works very active.
if (stopTime - startTime <= 1000 * 10 - 1) { 
  mbFreedSum += bytesFreed/1024/1024; // Now went less then 10 sec, just sum up 
and thats all
} else {
  freedDataOverheadPercent = (int) (mbFreedSum * 100 / 
cache.heavyEvictionBytesSizeLimit) - 100;
  if (mbFreedSum > cache.heavyEvictionBytesSizeLimit) {
    heavyEvictionCount++;
    if (heavyEvictionCount > cache.heavyEvictionCountLimit) {
      if (freedDataOverheadPercent > 100) {
        cache.cacheDataBlockPercent -= 3;
      } else {
        if (freedDataOverheadPercent > 50) {
          cache.cacheDataBlockPercent -= 1;
        } else {
          if (freedDataOverheadPercent < 30) {
            cache.cacheDataBlockPercent += 1;
          }
        }
      }
    }
  } else {
    if (mbFreedSum > cache.heavyEvictionBytesSizeLimit * 0.5 && 
cache.cacheDataBlockPercent < 50) {
      cache.cacheDataBlockPercent += 5; // It help prevent some premature 
escape from accidental fluctuation. Will be fine add more logic here.
    } else {
      heavyEvictionCount = 0;
      cache.cacheDataBlockPercent = 100;
    }
  }
  LOG.info("BlockCache evicted (MB): {}, overhead (%): {}, " +
                  "heavy eviction counter: {}, " +
                  "current caching DataBlock (%): {}",
          mbFreedSum, freedDataOverheadPercent,
          heavyEvictionCount, cache.cacheDataBlockPercent);

  mbFreedSum = 0;
  startTime = stopTime;
}
{code}
 

 

I prepared 4 tables (32 regions each):

tbl1 - 200 mln records, 100 bytes each. Total size 30 Gb.

tbl2 - 20 mln records, 500 bytes each. Total size 10.4 Gb.

tbl3 - 100 mln records, 100 bytes each. Total size 15.4 Gb.

tbl4 -  the same like tbl3 but I use it for testing work with batches 
(batchSize=100)

Workload scenario "u":

_operationcount=50 000 000 (for tbl4 just 500 000 because there is batch 100)_
 _readproportion=1_
 _requestdistribution=uniform_

 

Workload scenario "z":

_operationcount=50 000 000 (for tbl4 just 500 000 because there is batch 100)_
 _readproportion=1_
 _requestdistribution=zipfian_

 

Workload scenario "l":

_operationcount=50 000 000 (for tbl4 just 500 000 because there is batch 100)_
 _readproportion=1_
 _requestdistribution=latest_

 

 Then I run all tables with all scenarios on original version (total 4*3=12 
tests) and 12 with the feature:

*hbase.lru.cache.heavy.eviction.count.limit* = 3

*hbase.lru.cache.heavy.eviction.mb.size.limit* = 200

Performance results:

!requests_100p.png!

 We could see that on the second graph lines have some a step at the begin. It 
is because works auto scaling.

Let see the log of RegionServer:

LruBlockCache: BlockCache evicted (MB): 0, overhead (%): -100, heavy eviction 
counter: 0, current caching DataBlock (%): 100   | no load, do nothing
 LruBlockCache: BlockCache evicted (MB): 0, overhead (%): -100, heavy eviction 
counter: 0, current caching DataBlock (%): 100
 LruBlockCache: BlockCache evicted (MB): 0, overhead (%): -100, heavy eviction 
counter: 0, current caching DataBlock (%): 100
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 1, current caching DataBlock (%): 100  | start reading but 
*count.limit* haven't reached.
 LruBlockCache: BlockCache evicted (MB): 6958, overhead (%): 3379, heavy 
eviction counter: 2, current caching DataBlock (%): 100 
 LruBlockCache: BlockCache evicted (MB): 8117, overhead (%): 3958, heavy 
eviction counter: 3, current caching DataBlock (%): 100
 LruBlockCache: BlockCache evicted (MB): 8713, overhead (%): 4256, heavy 
eviction counter: 4, current caching DataBlock (%): 97  | *count.limit* have 
reached, decrease on 3%
 LruBlockCache: BlockCache evicted (MB): 8723, overhead (%): 4261, heavy 
eviction counter: 5, current caching DataBlock (%): 94
 LruBlockCache: BlockCache evicted (MB): 8318, overhead (%): 4059, heavy 
eviction counter: 6, current caching DataBlock (%): 91
 LruBlockCache: BlockCache evicted (MB): 7722, overhead (%): 3761, heavy 
eviction counter: 7, current caching DataBlock (%): 88
 LruBlockCache: BlockCache evicted (MB): 7840, overhead (%): 3820, heavy 
eviction counter: 8, current caching DataBlock (%): 85
 LruBlockCache: BlockCache evicted (MB): 8032, overhead (%): 3916, heavy 
eviction counter: 9, current caching DataBlock (%): 82
 LruBlockCache: BlockCache evicted (MB): 7687, overhead (%): 3743, heavy 
eviction counter: 10, current caching DataBlock (%): 79
 LruBlockCache: BlockCache evicted (MB): 7458, overhead (%): 3629, heavy 
eviction counter: 11, current caching DataBlock (%): 76
 LruBlockCache: BlockCache evicted (MB): 7343, overhead (%): 3571, heavy 
eviction counter: 12, current caching DataBlock (%): 73
 LruBlockCache: BlockCache evicted (MB): 6769, overhead (%): 3284, heavy 
eviction counter: 13, current caching DataBlock (%): 70
 LruBlockCache: BlockCache evicted (MB): 6655, overhead (%): 3227, heavy 
eviction counter: 14, current caching DataBlock (%): 67
 LruBlockCache: BlockCache evicted (MB): 6080, overhead (%): 2940, heavy 
eviction counter: 15, current caching DataBlock (%): 64
 LruBlockCache: BlockCache evicted (MB): 5851, overhead (%): 2825, heavy 
eviction counter: 16, current caching DataBlock (%): 61
 LruBlockCache: BlockCache evicted (MB): 5277, overhead (%): 2538, heavy 
eviction counter: 17, current caching DataBlock (%): 58
 LruBlockCache: BlockCache evicted (MB): 4933, overhead (%): 2366, heavy 
eviction counter: 18, current caching DataBlock (%): 55
 LruBlockCache: BlockCache evicted (MB): 4359, overhead (%): 2079, heavy 
eviction counter: 19, current caching DataBlock (%): 52
 LruBlockCache: BlockCache evicted (MB): 4015, overhead (%): 1907, heavy 
eviction counter: 20, current caching DataBlock (%): 49
 LruBlockCache: BlockCache evicted (MB): 3556, overhead (%): 1678, heavy 
eviction counter: 21, current caching DataBlock (%): 46
 LruBlockCache: BlockCache evicted (MB): 3097, overhead (%): 1448, heavy 
eviction counter: 22, current caching DataBlock (%): 43
 LruBlockCache: BlockCache evicted (MB): 2638, overhead (%): 1219, heavy 
eviction counter: 23, current caching DataBlock (%): 40
 LruBlockCache: BlockCache evicted (MB): 2179, overhead (%): 989, heavy 
eviction counter: 24, current caching DataBlock (%): 37
 LruBlockCache: BlockCache evicted (MB): 1835, overhead (%): 817, heavy 
eviction counter: 25, current caching DataBlock (%): 34
 LruBlockCache: BlockCache evicted (MB): 1491, overhead (%): 645, heavy 
eviction counter: 26, current caching DataBlock (%): 31
 LruBlockCache: BlockCache evicted (MB): 1032, overhead (%): 416, heavy 
eviction counter: 27, current caching DataBlock (%): 28
 LruBlockCache: BlockCache evicted (MB): 688, overhead (%): 244, heavy eviction 
counter: 28, current caching DataBlock (%): 25
 LruBlockCache: BlockCache evicted (MB): 458, overhead (%): 129, heavy eviction 
counter: 29, current caching DataBlock (%): 22
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 30, current caching DataBlock (%): 23  | wow, too low! up 1 %
 LruBlockCache: BlockCache evicted (MB): 114, overhead (%): -43, heavy eviction 
counter: 30, current caching DataBlock (%): 28  | accidental fluctuation? plus 5
 LruBlockCache: BlockCache evicted (MB): 344, overhead (%): 72, heavy eviction 
counter: 31, current caching DataBlock (%): 27  | now ok, continue slow down
 LruBlockCache: BlockCache evicted (MB): 344, overhead (%): 72, heavy eviction 
counter: 32, current caching DataBlock (%): 26
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 33, current caching DataBlock (%): 27
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 34, current caching DataBlock (%): 28
 LruBlockCache: BlockCache evicted (MB): 344, overhead (%): 72, heavy eviction 
counter: 35, current caching DataBlock (%): 27
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 36, current caching DataBlock (%): 28
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 37, current caching DataBlock (%): 29
 LruBlockCache: BlockCache evicted (MB): 458, overhead (%): 129, heavy eviction 
counter: 38, current caching DataBlock (%): 26
 LruBlockCache: BlockCache evicted (MB): 344, overhead (%): 72, heavy eviction 
counter: 39, current caching DataBlock (%): 25
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 40, current caching DataBlock (%): 26

Take a look how to work eviction process. We can see the same place on the 
second graph (at the beginning):
 !eviction_100p.png!

 

Of course GC much less when we use the feature:

!gc_100p.png!

So, all results YCSB I collected into table view:
| |*original*|*feature*|*%*|
|tbl1-u (ops/sec)|33,191|46,587|140|
|tbl2-u (ops/sec)|41,959|62,695|149|
|tbl3-u (ops/sec)|41,485|61,407|148|
|tbl4-u (ops/sec)|382|638|167|
|tbl1-z (ops/sec)|51,077|60,264|118|
|tbl2-z (ops/sec)|57,103|70,809|124|
|tbl3-z (ops/sec)|59,796|69,426|116|
|tbl4-z (ops/sec)|500|724|145|
|tbl1-l (ops/sec)|71,857|77,682|108|
|tbl2-l (ops/sec)|74,836|82,893|111|
|tbl3-l (ops/sec)|74,573|78,871|106|
|tbl4-l (ops/sec)|647|821|127|

For long-term workloads improving will be more, because some time at the 
beginning we spend on auto scaling.

And some information about latency:
| |*original*|*feature*|*%*|
|tbl1-u AverageLatency(us)|1,503|1,071|71|
|tbl2-u AverageLatency(us)|1,189|795|67|
|tbl3-u AverageLatency(us)|1,203|812|68|
|tbl4-u AverageLatency(us)|65,285|39,134|60|
|tbl1-z AverageLatency(us)|976|827|85|
|tbl2-z AverageLatency(us)|873|704|81|
|tbl3-z AverageLatency(us)|834|718|86|
|tbl4-z AverageLatency(us)|49,831|34,435|69|
|tbl1-l AverageLatency(us)|694|641|92|
|tbl2-l AverageLatency(us)|666|601|90|
|tbl3-l AverageLatency(us)|668|632|95|
|tbl4-l AverageLatency(us)|38,501|30,342|79|

 
| |*original*|*feature*|*%*|
|tbl1-u 95thPercentileLatency(us)|2,231|2,071|93|
|tbl2-u 95thPercentileLatency(us)|1,134|1,044|92|
|tbl3-u 95thPercentileLatency(us)|1,274|1,136|89|
|tbl4-u 95thPercentileLatency(us)|340,991|54,111|16|
|tbl1-z 95thPercentileLatency(us)|1,459|1,521|104|
|tbl2-z 95thPercentileLatency(us)|891|896|101|
|tbl3-z 95thPercentileLatency(us)|931|968|104|
|tbl4-z 95thPercentileLatency(us)|316,159|55,135|17|
|tbl1-l 95thPercentileLatency(us)|992|997|101|
|tbl2-l 95thPercentileLatency(us)|773|746|97|
|tbl3-l 95thPercentileLatency(us)|801|833|104|
|tbl4-l 95thPercentileLatency(us)|67,583|54,143|80|


was (Author: pustota):
All tests below have done on my home PC: _AMD Ryzen 7 2700X Eight-Core 
Processor (3150 MHz, 16 threads)._

Logic of autoscaling (see describe here):

 
{code:java}
public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory) 
{
  if (cacheDataBlockPercent != 100 && buf.getBlockType().isData()) {
    if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) {
      return;
    }
  }
...{code}
And how to calculate cacheDataBlockPercent is here:
{code:java}
public void run() {
...
LruBlockCache cache = this.cache.get();
if (cache == null) break;
bytesFreed = cache.evict();
long stopTime = System.currentTimeMillis(); // We need of control the time of 
working cache.evict()
// If heavy cleaning BlockCache control.
// It helps avoid put too many blocks into BlockCache
// when evict() works very active.
if (stopTime - startTime <= 1000 * 10 - 1) { 
  mbFreedSum += bytesFreed/1024/1024; // Now went less then 10 sec, just sum up 
and thats all
} else {
  freedDataOverheadPercent = (int) (mbFreedSum * 100 / 
cache.heavyEvictionBytesSizeLimit) - 100;
  if (mbFreedSum > cache.heavyEvictionBytesSizeLimit) {
    heavyEvictionCount++;
    if (heavyEvictionCount > cache.heavyEvictionCountLimit) {
      if (freedDataOverheadPercent > 100) {
        cache.cacheDataBlockPercent -= 3;
      } else {
        if (freedDataOverheadPercent > 50) {
          cache.cacheDataBlockPercent -= 1;
        } else {
          if (freedDataOverheadPercent < 30) {
            cache.cacheDataBlockPercent += 1;
          }
        }
      }
    }
  } else {
    if (bytesFreedSum > cache.heavyEvictionBytesSizeLimit * 0.5 && 
cache.cacheDataBlockPercent < 50) {
      cache.cacheDataBlockPercent += 5; // It help prevent some premature 
escape from accidental fluctuation. Will be fine add more logic here.
    } else {
      heavyEvictionCount = 0;
      cache.cacheDataBlockPercent = 100;
    }
  }
  LOG.info("BlockCache evicted (MB): {}, overhead (%): {}, " +
                  "heavy eviction counter: {}, " +
                  "current caching DataBlock (%): {}",
          mbFreedSum, freedDataOverheadPercent,
          heavyEvictionCount, cache.cacheDataBlockPercent);

  mbFreedSum = 0;
  startTime = stopTime;
}
{code}
 

 

I prepared 4 tables (32 regions each):

tbl1 - 200 mln records, 100 bytes each. Total size 30 Gb.

tbl2 - 20 mln records, 500 bytes each. Total size 10.4 Gb.

tbl3 - 100 mln records, 100 bytes each. Total size 15.4 Gb.

tbl4 -  the same like tbl3 but I use it for testing work with batches 
(batchSize=100)

Workload scenario "u":

_operationcount=50 000 000 (for tbl4 just 500 000 because there is batch 100)_
 _readproportion=1_
 _requestdistribution=uniform_

 

Workload scenario "z":

_operationcount=50 000 000 (for tbl4 just 500 000 because there is batch 100)_
 _readproportion=1_
 _requestdistribution=zipfian_

 

Workload scenario "l":

_operationcount=50 000 000 (for tbl4 just 500 000 because there is batch 100)_
 _readproportion=1_
 _requestdistribution=latest_

 

 Then I run all tables with all scenarios on original version (total 4*3=12 
tests) and 12 with the feature.

*hbase.lru.cache.heavy.eviction.count.limit* = 3

*hbase.lru.cache.heavy.eviction.mb.size.limit* = 200

Performance results:

!requests_100p.png!

 We could see that on the second graph lines have some a step at the begin. It 
is because works auto scaling.

Let see the log of RegionServer:

LruBlockCache: BlockCache evicted (MB): 0, overhead (%): -100, heavy eviction 
counter: 0, current caching DataBlock (%): 100   | no load, do nothing
 LruBlockCache: BlockCache evicted (MB): 0, overhead (%): -100, heavy eviction 
counter: 0, current caching DataBlock (%): 100
 LruBlockCache: BlockCache evicted (MB): 0, overhead (%): -100, heavy eviction 
counter: 0, current caching DataBlock (%): 100
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 1, current caching DataBlock (%): 100  | start reading but 
*count.limit* haven't reach.
 LruBlockCache: BlockCache evicted (MB): 6958, overhead (%): 3379, heavy 
eviction counter: 2, current caching DataBlock (%): 100 
 LruBlockCache: BlockCache evicted (MB): 8117, overhead (%): 3958, heavy 
eviction counter: 3, current caching DataBlock (%): 100
 LruBlockCache: BlockCache evicted (MB): 8713, overhead (%): 4256, heavy 
eviction counter: 4, current caching DataBlock (%): 97  | *count.limit* have 
reached, decrease on 3%
 LruBlockCache: BlockCache evicted (MB): 8723, overhead (%): 4261, heavy 
eviction counter: 5, current caching DataBlock (%): 94
 LruBlockCache: BlockCache evicted (MB): 8318, overhead (%): 4059, heavy 
eviction counter: 6, current caching DataBlock (%): 91
 LruBlockCache: BlockCache evicted (MB): 7722, overhead (%): 3761, heavy 
eviction counter: 7, current caching DataBlock (%): 88
 LruBlockCache: BlockCache evicted (MB): 7840, overhead (%): 3820, heavy 
eviction counter: 8, current caching DataBlock (%): 85
 LruBlockCache: BlockCache evicted (MB): 8032, overhead (%): 3916, heavy 
eviction counter: 9, current caching DataBlock (%): 82
 LruBlockCache: BlockCache evicted (MB): 7687, overhead (%): 3743, heavy 
eviction counter: 10, current caching DataBlock (%): 79
 LruBlockCache: BlockCache evicted (MB): 7458, overhead (%): 3629, heavy 
eviction counter: 11, current caching DataBlock (%): 76
 LruBlockCache: BlockCache evicted (MB): 7343, overhead (%): 3571, heavy 
eviction counter: 12, current caching DataBlock (%): 73
 LruBlockCache: BlockCache evicted (MB): 6769, overhead (%): 3284, heavy 
eviction counter: 13, current caching DataBlock (%): 70
 LruBlockCache: BlockCache evicted (MB): 6655, overhead (%): 3227, heavy 
eviction counter: 14, current caching DataBlock (%): 67
 LruBlockCache: BlockCache evicted (MB): 6080, overhead (%): 2940, heavy 
eviction counter: 15, current caching DataBlock (%): 64
 LruBlockCache: BlockCache evicted (MB): 5851, overhead (%): 2825, heavy 
eviction counter: 16, current caching DataBlock (%): 61
 LruBlockCache: BlockCache evicted (MB): 5277, overhead (%): 2538, heavy 
eviction counter: 17, current caching DataBlock (%): 58
 LruBlockCache: BlockCache evicted (MB): 4933, overhead (%): 2366, heavy 
eviction counter: 18, current caching DataBlock (%): 55
 LruBlockCache: BlockCache evicted (MB): 4359, overhead (%): 2079, heavy 
eviction counter: 19, current caching DataBlock (%): 52
 LruBlockCache: BlockCache evicted (MB): 4015, overhead (%): 1907, heavy 
eviction counter: 20, current caching DataBlock (%): 49
 LruBlockCache: BlockCache evicted (MB): 3556, overhead (%): 1678, heavy 
eviction counter: 21, current caching DataBlock (%): 46
 LruBlockCache: BlockCache evicted (MB): 3097, overhead (%): 1448, heavy 
eviction counter: 22, current caching DataBlock (%): 43
 LruBlockCache: BlockCache evicted (MB): 2638, overhead (%): 1219, heavy 
eviction counter: 23, current caching DataBlock (%): 40
 LruBlockCache: BlockCache evicted (MB): 2179, overhead (%): 989, heavy 
eviction counter: 24, current caching DataBlock (%): 37
 LruBlockCache: BlockCache evicted (MB): 1835, overhead (%): 817, heavy 
eviction counter: 25, current caching DataBlock (%): 34
 LruBlockCache: BlockCache evicted (MB): 1491, overhead (%): 645, heavy 
eviction counter: 26, current caching DataBlock (%): 31
 LruBlockCache: BlockCache evicted (MB): 1032, overhead (%): 416, heavy 
eviction counter: 27, current caching DataBlock (%): 28
 LruBlockCache: BlockCache evicted (MB): 688, overhead (%): 244, heavy eviction 
counter: 28, current caching DataBlock (%): 25
 LruBlockCache: BlockCache evicted (MB): 458, overhead (%): 129, heavy eviction 
counter: 29, current caching DataBlock (%): 22
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 30, current caching DataBlock (%): 23  | wow, too low! up 1 %
 LruBlockCache: BlockCache evicted (MB): 114, overhead (%): -43, heavy eviction 
counter: 30, current caching DataBlock (%): 28  | accidental fluctuation? plus 5
 LruBlockCache: BlockCache evicted (MB): 344, overhead (%): 72, heavy eviction 
counter: 31, current caching DataBlock (%): 27  | now ok, continue slow down
 LruBlockCache: BlockCache evicted (MB): 344, overhead (%): 72, heavy eviction 
counter: 32, current caching DataBlock (%): 26
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 33, current caching DataBlock (%): 27
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 34, current caching DataBlock (%): 28
 LruBlockCache: BlockCache evicted (MB): 344, overhead (%): 72, heavy eviction 
counter: 35, current caching DataBlock (%): 27
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 36, current caching DataBlock (%): 28
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 37, current caching DataBlock (%): 29
 LruBlockCache: BlockCache evicted (MB): 458, overhead (%): 129, heavy eviction 
counter: 38, current caching DataBlock (%): 26
 LruBlockCache: BlockCache evicted (MB): 344, overhead (%): 72, heavy eviction 
counter: 39, current caching DataBlock (%): 25
 LruBlockCache: BlockCache evicted (MB): 229, overhead (%): 14, heavy eviction 
counter: 40, current caching DataBlock (%): 26

Take a look how to work eviction process. We can see the same place on the 
second graph (at the beginning):
 !eviction_100p.png!

 

Of course GC much less when we use the feature:

!gc_100p.png!

So, all results YCSB I collected into table view:
| |*original*|*feature*|*%*|
|tbl1-u (ops/sec)|33,191|46,587|140|
|tbl2-u (ops/sec)|41,959|62,695|149|
|tbl3-u (ops/sec)|41,485|61,407|148|
|tbl4-u (ops/sec)|382|638|167|
|tbl1-z (ops/sec)|51,077|60,264|118|
|tbl2-z (ops/sec)|57,103|70,809|124|
|tbl3-z (ops/sec)|59,796|69,426|116|
|tbl4-z (ops/sec)|500|724|145|
|tbl1-l (ops/sec)|71,857|77,682|108|
|tbl2-l (ops/sec)|74,836|82,893|111|
|tbl3-l (ops/sec)|74,573|78,871|106|
|tbl4-l (ops/sec)|647|821|127|

For long-term workloads improving will be more, because some time at the 
beginning we spend on auto scaling.

And some information about latency:
| |*original*|*feature*|*%*|
|tbl1-u AverageLatency(us)|1,503|1,071|71|
|tbl2-u AverageLatency(us)|1,189|795|67|
|tbl3-u AverageLatency(us)|1,203|812|68|
|tbl4-u AverageLatency(us)|65,285|39,134|60|
|tbl1-z AverageLatency(us)|976|827|85|
|tbl2-z AverageLatency(us)|873|704|81|
|tbl3-z AverageLatency(us)|834|718|86|
|tbl4-z AverageLatency(us)|49,831|34,435|69|
|tbl1-l AverageLatency(us)|694|641|92|
|tbl2-l AverageLatency(us)|666|601|90|
|tbl3-l AverageLatency(us)|668|632|95|
|tbl4-l AverageLatency(us)|38,501|30,342|79|

 
| |*original*|*feature*|*%*|
|tbl1-u 95thPercentileLatency(us)|2,231|2,071|93|
|tbl2-u 95thPercentileLatency(us)|1,134|1,044|92|
|tbl3-u 95thPercentileLatency(us)|1,274|1,136|89|
|tbl4-u 95thPercentileLatency(us)|340,991|54,111|16|
|tbl1-z 95thPercentileLatency(us)|1,459|1,521|104|
|tbl2-z 95thPercentileLatency(us)|891|896|101|
|tbl3-z 95thPercentileLatency(us)|931|968|104|
|tbl4-z 95thPercentileLatency(us)|316,159|55,135|17|
|tbl1-l 95thPercentileLatency(us)|992|997|101|
|tbl2-l 95thPercentileLatency(us)|773|746|97|
|tbl3-l 95thPercentileLatency(us)|801|833|104|
|tbl4-l 95thPercentileLatency(us)|67,583|54,143|80|

> BlockCache performance improve by reduce eviction rate
> ------------------------------------------------------
>
>                 Key: HBASE-23887
>                 URL: https://issues.apache.org/jira/browse/HBASE-23887
>             Project: HBase
>          Issue Type: Improvement
>          Components: BlockCache, Performance
>            Reporter: Danil Lipovoy
>            Priority: Minor
>         Attachments: 1582787018434_rs_metrics.jpg, 
> 1582801838065_rs_metrics_new.png, BC_LongRun.png, 
> BlockCacheEvictionProcess.gif, cmp.png, evict_BC100_vs_BC23.png, 
> eviction_100p.png, eviction_100p.png, eviction_100p.png, gc_100p.png, 
> read_requests_100pBC_vs_23pBC.png, requests_100p.png, requests_100p.png
>
>
> Hi!
> I first time here, correct me please if something wrong.
> I want propose how to improve performance when data in HFiles much more than 
> BlockChache (usual story in BigData). The idea - caching only part of DATA 
> blocks. It is good becouse LruBlockCache starts to work and save huge amount 
> of GC. 
> Sometimes we have more data than can fit into BlockCache and it is cause a 
> high rate of evictions. In this case we can skip cache a block N and insted 
> cache the N+1th block. Anyway we would evict N block quite soon and that why 
> that skipping good for performance.
> Example:
> Imagine we have little cache, just can fit only 1 block and we are trying to 
> read 3 blocks with offsets:
> 124
> 198
> 223
> Current way - we put the block 124, then put 198, evict 124, put 223, evict 
> 198. A lot of work (5 actions).
> With the feature - last few digits evenly distributed from 0 to 99. When we 
> divide by modulus we got:
> 124 -> 24
> 198 -> 98
> 223 -> 23
> It helps to sort them. Some part, for example below 50 (if we set 
> *hbase.lru.cache.data.block.percent* = 50) go into the cache. And skip 
> others. It means we will not try to handle the block 198 and save CPU for 
> other job. In the result - we put block 124, then put 223, evict 124 (3 
> actions). 
> See the picture in attachment with test below. Requests per second is higher, 
> GC is lower.
>  
> The key point of the code:
> Added the parameter: *hbase.lru.cache.data.block.percent* which by default = 
> 100
>  
> But if we set it 1-99, then will work the next logic:
>  
>  
> {code:java}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean 
> inMemory) {   
>   if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())      
>     if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) 
>       return;    
> ... 
> // the same code as usual
> }
> {code}
>  
> Other parameters help to control when this logic will be enabled. It means it 
> will work only while heavy reading going on.
> hbase.lru.cache.heavy.eviction.count.limit - set how many times have to run 
> eviction process that start to avoid of putting data to BlockCache
> hbase.lru.cache.heavy.eviction.bytes.size.limit - set how many bytes have to 
> evicted each time that start to avoid of putting data to BlockCache
> By default: if 10 times (100 secunds) evicted more than 10 MB (each time) 
> then we start to skip 50% of data blocks.
> When heavy evitions process end then new logic off and will put into 
> BlockCache all blocks again.
>  
> Descriptions of the test:
> 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 4 RegionServers
> 4 tables by 64 regions by 1.88 Gb data in each = 600 Gb total (only FAST_DIFF)
> Total BlockCache Size = 48 Gb (8 % of data in HFiles)
> Random read in 20 threads
>  
> I am going to make Pull Request, hope it is right way to make some 
> contribution in this cool product.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to