[
https://issues.apache.org/jira/browse/HBASE-23887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044154#comment-17044154
]
Reid Chan commented on HBASE-23887:
-----------------------------------
{code}if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent){code}
It only caches fixed blocks, not a truly random x percent of blocks. It is a
simple mathematical problem.
That means, in some cases, if those uncached blocks are read much more than
cached blocks, the req/secs should decrease? cacheMiss ratio grow? latency grow?
Better to give explanation why does it contribute to req/sec grows? We, as
project maintainers, need to know that and so do users. It is not very
intuitive to me.
Also need to update hbase doc, in what situation should user tune this
parameter?
Up till now, my point towards this issue is, it is not a general improvement
solution.
> BlockCache performance improve
> ------------------------------
>
> Key: HBASE-23887
> URL: https://issues.apache.org/jira/browse/HBASE-23887
> Project: HBase
> Issue Type: New Feature
> Reporter: Danil Lipovoy
> Priority: Minor
> Attachments: cmp.png
>
>
> Hi!
> I first time here, correct me please if something wrong.
> I want propose how to improve performance when data in HFiles much more than
> BlockChache (usual story in BigData). The idea - caching only part of DATA
> blocks. It is good becouse LruBlockCache starts to work and save huge amount
> of GC. See the picture in attachment with test below. Requests per second is
> higher, GC is lower.
>
> The key point of the code:
> Added the parameter: *hbase.lru.cache.data.block.percent* which by default =
> 100
>
> But if we set it 0-99, then will work the next logic:
>
>
> {code:java}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean
> inMemory) {
> if (cacheDataBlockPercent != 100 && buf.getBlockType().isData())
> if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent)
> return;
> ...
> // the same code as usual
> }
> {code}
>
>
> Descriptions of the test:
> 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 4 RegionServers
> 4 tables by 64 regions by 1.88 Gb data in each = 600 Gb total (only FAST_DIFF)
> Total BlockCache Size = 48 Gb (8 % of data in HFiles)
> Random read in 20 threads
>
> I am going to make Pull Request, hope it is right way to make some
> contribution in this cool product.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)