[ 
https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danil Lipovoy updated HBASE-23887:
----------------------------------
    Description: 
Hi!

I first time here, sorry if something wrong.

I want propose how to improve performance when data in HFiles much more than 
BlockChache (usual story in BigData). The idea - caching only part of DATA 
blocks. It is good becouse LruBlockCache starts to work and save huge amount of 
GC. See the picture in attachment with test below. Requests per second is 
higher, GC is lower.

 

The key point of the code:

Added the parameter: *hbase.lru.cache.data.block.percent* which by default = 100

 

But if we set it 0-99, then will work the next logic:

 

 
{code:java}
public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory) 
{   
  if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())      
    if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) 
      return;    
... 
// the same code as usual
}
{code}
 

 

Descriptions of the test:

3 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.

4 RegionServers

5 tables by 64 regions by 1.88 Gb data in each = 700 Gb total (only FAST_DIFF)

Total BlockCache Size = 38 Gb

Random read in 24 threads

 

I am going to make Pull Request, hope it is right way to make some contribution 
in this cool product. Correct me please if something wrong.

 

 

  was:
Hi!

I first time here, sorry if something wrong.

I want propose how to improve performance when data in HFiles much more than 
BlockChache (usual story in BigData). The idea - caching only part of DATA 
blocks. It is good becouse LruBlockCache starts to work and save huge amount of 
GC. See the picture in attachment with test below. Requests per second is 
higher, GC is lower.

 

The key point of the code:

Added the parameter: *hbase.lru.cache.data.block.percent* which by default = 100

 

But if we set it 0-99, then will work the next logic:

 

public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory)

{   if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())    

 if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) return;    ...

// the same code }

 

Descriptions of the test:

3 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.

4 RegionServers

5 tables by 64 regions by 1.88 Gb data in each = 700 Gb total (only FAST_DIFF)

Total BlockCache Size = 38 Gb

Random read in 24 threads

 

I am going to make Pull Request, hope it is right way to make some contribution 
in this cool product. Correct me please if something wrong.

 

 


> BlockCache performance improve
> ------------------------------
>
>                 Key: HBASE-23887
>                 URL: https://issues.apache.org/jira/browse/HBASE-23887
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Danil Lipovoy
>            Priority: Minor
>         Attachments: 2jira.png
>
>
> Hi!
> I first time here, sorry if something wrong.
> I want propose how to improve performance when data in HFiles much more than 
> BlockChache (usual story in BigData). The idea - caching only part of DATA 
> blocks. It is good becouse LruBlockCache starts to work and save huge amount 
> of GC. See the picture in attachment with test below. Requests per second is 
> higher, GC is lower.
>  
> The key point of the code:
> Added the parameter: *hbase.lru.cache.data.block.percent* which by default = 
> 100
>  
> But if we set it 0-99, then will work the next logic:
>  
>  
> {code:java}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean 
> inMemory) {   
>   if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())      
>     if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) 
>       return;    
> ... 
> // the same code as usual
> }
> {code}
>  
>  
> Descriptions of the test:
> 3 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 4 RegionServers
> 5 tables by 64 regions by 1.88 Gb data in each = 700 Gb total (only FAST_DIFF)
> Total BlockCache Size = 38 Gb
> Random read in 24 threads
>  
> I am going to make Pull Request, hope it is right way to make some 
> contribution in this cool product. Correct me please if something wrong.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to