[ 
https://issues.apache.org/jira/browse/HBASE-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039678#comment-15039678
 ] 

Vikas Vishwakarma commented on HBASE-14869:
-------------------------------------------

[~apurtell] thanks for the review. We do not have splunk forwarders for the 
test env but we already have daily automation scripts running on production 
logs extracting operation latencies from periodic hbase metrics dump like 
Mutate_mean, Mutate_95th_percentile. Since this is just addition to the above 
metric list, we can easily get these metrics also using the same script. 
However I have tested this only locally on dev setup but will set this up on a 
full cluster and run some long running and high load tests to check for perf 
impact, cpu usage etc and update the test results. Sounds ok? 
If the naming convention, range values used for these metrics needs to be 
changed, I can do the same based on suggestion and update the patch.

> Better request latency histograms
> ---------------------------------
>
>                 Key: HBASE-14869
>                 URL: https://issues.apache.org/jira/browse/HBASE-14869
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>            Assignee: Vikas Vishwakarma
>             Fix For: 2.0.0, 1.3.0, 0.98.17
>
>         Attachments: 14869-test-0.98.txt, 14869-v1-0.98.txt, 
> 14869-v1-2.0.txt, 14869-v2-0.98.txt, 14869-v2-2.0.txt, 14869-v3-0.98.txt, 
> 14869-v4-0.98.txt, 14869-v5-0.98.txt, AppendSizeTime.png, Get.png
>
>
> I just discussed this with a colleague.
> The get, put, etc, histograms that each region server keeps are somewhat 
> useless (depending on what you want to achieve of course), as they are 
> aggregated and calculated by each region server.
> It would be better to record the number of requests in certainly latency 
> bands in addition to what we do now.
> For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 
> 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be 
> configurable).
> That way we can do further calculations after the fact, and answer questions 
> like: How often did we miss our SLA? Percentage of requests that missed an 
> SLA, etc.
> Comments?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to