[
https://issues.apache.org/jira/browse/HBASE-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15395206#comment-15395206
]
ramkrishna.s.vasudevan commented on HBASE-16287:
------------------------------------------------
bq.It's possible that the loading speed is higher than eviction in some cases
like with PCIe SSD, according to our observation.
I agree to this. In case of L2 cache when configured in FileMode - with HDD
the caching is not fast enough and because of that the RAMQueue fills up and so
no new blocks are added to it. So the failedInsertion rate seems to be higher
whereas in case of PCIe SSDs the failed insertion rate is lower (in both cases
the Writer Thread counts were increased).
bq.Then those are moved to buckets. I had noticed this issue and thought of
handling it but kept forgetting.. Now remember again..
I think here in case of L2 the caching depends on how much faster the draining
happens from the RAMQueue. Also if the block cannot be cached by the writer
threads the code already tries handles cache full exception by trying to free
some space?
> BlockCache size should not exceed acceptableSize too many
> ---------------------------------------------------------
>
> Key: HBASE-16287
> URL: https://issues.apache.org/jira/browse/HBASE-16287
> Project: HBase
> Issue Type: Improvement
> Components: BlockCache
> Reporter: Yu Sun
>
> Our regionserver has a configuation as bellow:
> -Xmn4g -Xms32g -Xmx32g -XX:SurvriorRatio=2 -XX:+UseConcMarkSweepGC
> also we only use blockcache,and set hfile.block.cache.size = 0.3 in
> hbase_site.xml,so under this configuration, the lru block cache size will
> be(32g-1g)*0.3=9.3g. but in some scenarios,some of the rs will occur
> continuous FullGC for hours and most importantly, after FullGC most of the
> object in old will not be GCed. so we dump the heap and analyse with MAT and
> we observed a obvious memory leak in LruBlockCache, which occpy about 16g
> memory, then we set set class LruBlockCache log level to TRACE and observed
> this in log:
> {quote}
> 2016-07-22 12:17:58,158 INFO [LruBlockCacheStatsExecutor]
> hfile.LruBlockCache: totalSize=15.29 GB, freeSize=-5.99 GB, max=9.30 GB,
> blockCount=628182, accesses=101799469125, hits=93517800259, hitRatio=91.86%,
> , cachingAccesses=99462650031, cachingHits=93468334621,
> cachingHitsRatio=93.97%, evictions=238199, evicted=4776350518,
> evictedPerRun=20051.93359375{quote}
> we can see blockcache size has exceeded acceptableSize too many, which will
> cause the FullGC more seriously.
> Afterfter some investigations, I found in this function:
> {code:borderStyle=solid}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean
> inMemory,
> final boolean cacheDataInL1) {
> {code}
> No matter the blockcache size has been used, just put the block into it. but
> if the evict thread is not fast enough, blockcache size will increament
> significantly.
> So here I think we should have a check, for example, if the blockcache size >
> 1.2 * acceptableSize(), just return and dont put into it until the blockcache
> size if under watrmark. if this is reasonable, I can make a small patch for
> this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)