[
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694377#comment-13694377
]
Elliott Clark commented on HBASE-8370:
--------------------------------------
bq.Having a cache hit ratio of 80 % means that at least 80 % of my requests are
fast
I would disagree.
* Full handlers
* Giant gets of large amounts of data.
* Gets without a proper bloom filter.
* Things that skip past lots of (cached) blocks
* Slow data block encoding
* slow filters
* slow network
* lock contention
* GC
There are TONS of other reason that your requests can be slow. And without
knowing the work load you can't tell if cache miss is more or less likely than
any other explanation. I've seen workloads where the cache percent was in the
low teens and I've seen workloads where the cache percent was really 100%.
There's no way a priori to know if a number is good or bad. So you again are
back to using the metrics with a base line and comparing them. For that the
absolute numbers are less important.
bq.As far as derivatives go, Miss count derivative can go up with other things
like read request count
Yep and that makes things harder but the only thing that's not susceptible are
gauges. And like I said before I'm trying to move us off of gauges.
bq.I dont know the number of cache misses for Index block vs Data block vs
Bloom block. I would no longer know how many Data blocks are being accessed and
how many Index blocks etc
But those aren't actionable metrics.
* If your bloom block cache hit count goes down you can do....... Not much. Not
worth counting if you can't take action on it.
* With the way the index blocks works you can't cache miss them, after the
first time, unless we're oom (they aren't ever evicted, even if you turn off
caching the cf). So you'll see that there are some misses on region open, and
anytime there's a new flush or compaction. So it will be 100%. Compaction and
flush metrics are much more useful here for determining this kind of thing, so
there's no need to add more metrics for something that's better covered
somewhere else.
* So data blocks are the only useful one. and they dominate the number of
blocks requested. So this can pretty well be covered by the following.
** blockCacheExpressHitPercent
** blockCountHitPercent
** blockCacheHitCount
** blockCacheMissCount
I'm -1 adding any more metrics on the read path unless there's something that's
totally missed (Jeremy brought up a couple the last time I met with him). That
code is just too important to be instrumented any more for things that can be
figured out other ways (and I would argue better ways but that's less
important).
I'm +1 on making that cache hit percent a double so there's more accuracy.
> Report data block cache hit rates apart from aggregate cache hit rates
> ----------------------------------------------------------------------
>
> Key: HBASE-8370
> URL: https://issues.apache.org/jira/browse/HBASE-8370
> Project: HBase
> Issue Type: Improvement
> Components: metrics
> Reporter: Varun Sharma
> Assignee: Varun Sharma
> Priority: Minor
>
> Attaching from mail to [email protected]
> I am wondering whether the HBase cachingHitRatio metrics that the region
> server UI shows, can get me a break down by data blocks. I always see this
> number to be very high and that could be exagerated by the fact that each
> lookup hits the index blocks and bloom filter blocks in the block cache
> before retrieving the data block. This could be artificially bloating up the
> cache hit ratio.
> Assuming the above is correct, do we already have a cache hit ratio for data
> blocks alone which is more obscure ? If not, my sense is that it would be
> pretty valuable to add one.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira