[
https://issues.apache.org/jira/browse/HBASE-29012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902451#comment-17902451
]
Wellington Chevreuil commented on HBASE-29012:
----------------------------------------------
Yeah, in the case of compactions getting delayed, there would be an extended
"off-cache" period for referenced files blocks. The logic behind HBASE-27474
was for use cases where cache is at capacity, then caching references, plus the
extra caching from compactions would trigger mass evictions. Since then, we
faced other scenarios where keeping the blocks from split parents in the cache
were preferable, so we have implemented HBASE-28596, which I think, would solve
this scenario. We need to backport HBASE-28596 to branch-2.6.
> Performance regression in hot reads after split/merge
> -----------------------------------------------------
>
> Key: HBASE-29012
> URL: https://issues.apache.org/jira/browse/HBASE-29012
> Project: HBase
> Issue Type: Bug
> Affects Versions: 3.0.0-beta-1, 2.6.1
> Reporter: Bryan Beaudreault
> Priority: Major
>
> We noticed a significant performance regression which comes from HBASE-27474.
> In that ticket, logic is added so that we don't cache blocks that exist
> within a reference or a link if compactions are enabled.
> The issue we noticed is that we had a cluster which had compactions enabled,
> but compactions were a bit delayed. During the time, there were some regions
> which were recently split/merged and they contained references. This cluster
> is very hot reads and relies heavily on bloom filters. I noticed through
> profiles that we were spending a lot of time fetching BLOOM_CHUNK blocks from
> hdfs. This is almost never the case since we continually rightside the block
> cache to ensure all blooms are cached. In fact, we had no evictions at the
> time. So why weren't they getting cached?
> With trace logging enabled I noticed that all of the blocks being read over
> and over happened to come from hfiles that looked to be references. This led
> me to the ticket in question.
> This feels like a very serious regression, as it leads to substantantial
> impact to both hdfs and hbase in terms of request times and GC time and the
> host becomes fully hosed. I sort of wonder if we should revert that issue, or
> at the very least make it configurable. I'm not sure how to preserve the
> intended behavior of the ticket while also protecting the regionserver
> performance. In our case this happened for bloom blocks, but it could just as
> easily happen to a hot data block.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)