[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

Elliott Clark (JIRA) Wed, 26 Jun 2013 17:50:16 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694377#comment-13694377
 ]


Elliott Clark commented on HBASE-8370:
--------------------------------------

bq.Having a cache hit ratio of 80 % means that at least 80 % of my requests are 
fast
I would disagree. 

* Full handlers
* Giant gets of large amounts of data.
* Gets without a proper bloom filter.
* Things that skip past lots of (cached) blocks
* Slow data block encoding
* slow filters
* slow network
* lock contention
* GC

There are TONS of other reason that your requests can be slow.  And without 
knowing the work load you can't tell if cache miss is more or less likely than 
any other explanation.  I've seen workloads where the cache percent was in the 
low teens and I've seen workloads where the cache percent was really 100%.  
There's no way a priori to know if a number is good or bad.  So you again are 
back to using the metrics with a base line and comparing them.  For that the 
absolute numbers are less important.


bq.As far as derivatives go, Miss count derivative can go up with other things 
like read request count
Yep and that makes things harder but the only thing that's not susceptible are 
gauges.  And like I said before I'm trying to move us off of gauges.

bq.I dont know the number of cache misses for Index block vs Data block vs 
Bloom block. I would no longer know how many Data blocks are being accessed and 
how many Index blocks etc
But those aren't actionable metrics.  

* If your bloom block cache hit count goes down you can do....... Not much. Not 
worth counting if you can't take action on it.
* With the way the index blocks works you can't cache miss them, after the 
first time, unless we're oom (they aren't ever evicted, even if you turn off 
caching the cf).  So you'll see that there are some misses on region open, and 
anytime there's a new flush or compaction. So it will be 100%.  Compaction and 
flush metrics are much more useful here for determining this kind of thing, so 
there's no need to add more metrics for something that's better covered 
somewhere else.
* So data blocks are the only useful one.  and they dominate the number of 
blocks requested. So this can pretty well be covered by the following.
** blockCacheExpressHitPercent
** blockCountHitPercent
** blockCacheHitCount
** blockCacheMissCount

I'm -1 adding any more metrics on the read path unless there's something that's 
totally missed (Jeremy brought up a couple the last time I met with him).  That 
code is just too important to be instrumented any more for things that can be 
figured out other ways (and I would argue better ways but that's less 
important).

I'm +1 on making that cache hit percent a double so there's more accuracy.
                
> Report data block cache hit rates apart from aggregate cache hit rates
> ----------------------------------------------------------------------
>
>                 Key: HBASE-8370
>                 URL: https://issues.apache.org/jira/browse/HBASE-8370
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Varun Sharma
>            Assignee: Varun Sharma
>            Priority: Minor
>
> Attaching from mail to d...@hbase.apache.org
> I am wondering whether the HBase cachingHitRatio metrics that the region 
> server UI shows, can get me a break down by data blocks. I always see this 
> number to be very high and that could be exagerated by the fact that each 
> lookup hits the index blocks and bloom filter blocks in the block cache 
> before retrieving the data block. This could be artificially bloating up the 
> cache hit ratio.
> Assuming the above is correct, do we already have a cache hit ratio for data 
> blocks alone which is more obscure ? If not, my sense is that it would be 
> pretty valuable to add one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8370) Report data block cache hit rates apart from aggregate cache hit rates

Reply via email to