Sean Mackrory created HADOOP-14972:
--------------------------------------

             Summary: Histogram metrics types for latency, etc.
                 Key: HADOOP-14972
                 URL: https://issues.apache.org/jira/browse/HADOOP-14972
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/s3
            Reporter: Sean Mackrory
            Assignee: Sean Mackrory


We'd like metrics to track latencies for various operations, such as latencies 
for various request types, etc. This may need to be done different from current 
metrics types that are just counters of type long, and it needs to be done 
intelligently as these measurements are very numerous, and are primarily 
interesting due to the outliers that are unpredictably far from normal. 

* An adaptive, sparse histogram type. I envision something configurable with a 
maximumum granularity and a maximum number of bins. Initially, datapoints are 
tallied in bins with the maximum granularity. As we reach the maximum number of 
bins, bins are merged in even / odd pairs. There's some complexity here, 
especially to make it perform well and allow safe concurrency, but I like the 
ability to configure reasonable limits and retain as much granularity as 
possible without knowing the exact shape of the data beforehand.

* LongMetrics named "read_latency_600ms", "read_latency_800ms" to represent 
bins. This was suggested to me by [~fabbri]. I initially did not like the idea 
of having either so many hard-coded bins for however many op types, but this 
could also be done dynamically (we just hard-code which measurements we take, 
and with what granularity to group them, e.g. read_latency, 200 ms). The 
resulting dataset could be sparse and dynamic to allow for extreme outliers, 
but the granularity is still pre-determined.

* We could also simply track a certain number of the highest latencies, and 
basic descriptive statistics like a running average, min / max, etc. Inherently 
more limited in what it can show us, but much simpler and might still provide 
some insight when analyzing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to