[
https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044323#comment-17044323
]
Reid Chan edited comment on HBASE-23887 at 2/25/20 11:17 AM:
-------------------------------------------------------------
Then the naming *hbase.lru.cache.data.block.percent* is not good, it is
confusing.
One case: if client happens to read 198 many times, then 198 will never get
cached because of the rule.
Another one *extreme* case: all offset last 2 digits just between [00, 85]
(assuming 85 set), then BC will cache them all anyway...
>From my understanding, this issue's idea is based on,
1. caching less data in memory to lighten the burden of GC.
2. less caching and eviction to spare more CPU cycles.
but is it possible not only satisfy the need of cache most frequently read data
but also uncache parts of data according to some sort of rules?
was (Author: reidchan):
Then the naming *hbase.lru.cache.data.block.percent* is not good, it is
confusing.
One case: if client happens to read 198 many times, then 198 will never get
cached because of the rule.
Another one *extreme* case: all offset last 2 digits just between [00, 85]
(assuming 85 set), then BC will cache them all anyway...
So,
1. caching less data lighten the burden of GC.
2. Less cache and eviction, spare more CPU cycles.
I get your idea, but is it possible not only satisfy the need of cache most
frequently read data but also uncache parts of data according to some sort of
rules?
> BlockCache performance improve
> ------------------------------
>
> Key: HBASE-23887
> URL: https://issues.apache.org/jira/browse/HBASE-23887
> Project: HBase
> Issue Type: Improvement
> Components: BlockCache, Performance
> Reporter: Danil Lipovoy
> Priority: Minor
> Attachments: cmp.png
>
>
> Hi!
> I first time here, correct me please if something wrong.
> I want propose how to improve performance when data in HFiles much more than
> BlockChache (usual story in BigData). The idea - caching only part of DATA
> blocks. It is good becouse LruBlockCache starts to work and save huge amount
> of GC. See the picture in attachment with test below. Requests per second is
> higher, GC is lower.
>
> The key point of the code:
> Added the parameter: *hbase.lru.cache.data.block.percent* which by default =
> 100
>
> But if we set it 0-99, then will work the next logic:
>
>
> {code:java}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean
> inMemory) {
> if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())
> if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent)
> return;
> ...
> // the same code as usual
> }
> {code}
>
>
> Descriptions of the test:
> 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 4 RegionServers
> 4 tables by 64 regions by 1.88 Gb data in each = 600 Gb total (only FAST_DIFF)
> Total BlockCache Size = 48 Gb (8 % of data in HFiles)
> Random read in 20 threads
>
> I am going to make Pull Request, hope it is right way to make some
> contribution in this cool product.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)