[
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15030429#comment-15030429
]
Vikas Vishwakarma commented on HBASE-14869:
-------------------------------------------
[~lhofhansl] please review the attached patch (14869-v2-0.98.txt) once and
confirm if this approach is ok. I will then make the changes for all the
metrics and haddop1 and submit the final patch if review is ok
Changes done:
Created separate classes MutableTimeHistogram.java and MutableSizeHistogram.java
Took out common code related to min,mean,max,count stats into MutableStats.java
leaving snapshot related code for specific implementation
Added integration for the new metric types in the DynamicMetricsRegistry
At present I have changed only couple of metrics to time/size based histograms
(APPEND_SIZE,APPEND_TIME in MetricsWALSourceImpl and GET_KEY in
MetricsRegionServerSourceImpl) -- snapshot attached
Also some metrics are like Get but some have Time/Size postfixed to it like
AppendTime, AppendSize. Currently I have added a _TimeCount_ / _SizeCount_
postfix to the metrics but will probably just change it to _RangeCount_ or
something like that?
> Better request latency histograms
> ---------------------------------
>
> Key: HBASE-14869
> URL: https://issues.apache.org/jira/browse/HBASE-14869
> Project: HBase
> Issue Type: Brainstorming
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 14869-v2-0.98.txt
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat
> useless (depending on what you want to achieve of course), as they are
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms,
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be
> configurable).
> That way we can do further calculations after the fact, and answer questions
> like: How often did we miss our SLA? Percentage of requests that missed an
> SLA, etc.
> Comments?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)