[ 
https://issues.apache.org/jira/browse/HADOOP-14972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14972:
------------------------------------
    Parent Issue: HADOOP-15220  (was: HADOOP-14831)

> S3A add histogram metrics types for latency, etc.
> -------------------------------------------------
>
>                 Key: HADOOP-14972
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14972
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0, 3.0.0
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>            Priority: Major
>
> We'd like metrics to track latencies for various operations, such as 
> latencies for various request types, etc. This may need to be done different 
> from current metrics types that are just counters of type long, and it needs 
> to be done intelligently as these measurements are very numerous, and are 
> primarily interesting due to the outliers that are unpredictably far from 
> normal. A few ideas on how we might implement something like this:
> * An adaptive, sparse histogram type. I envision something configurable with 
> a maximumum granularity and a maximum number of bins. Initially, datapoints 
> are tallied in bins with the maximum granularity. As we reach the maximum 
> number of bins, bins are merged in even / odd pairs. There's some complexity 
> here, especially to make it perform well and allow safe concurrency, but I 
> like the ability to configure reasonable limits and retain as much 
> granularity as possible without knowing the exact shape of the data 
> beforehand.
> * LongMetrics named "read_latency_600ms", "read_latency_800ms" to represent 
> bins. This was suggested to me by [~fabbri]. I initially did not like the 
> idea of having either so many hard-coded bins for however many op types, but 
> this could also be done dynamically (we just hard-code which measurements we 
> take, and with what granularity to group them, e.g. read_latency, 200 ms). 
> The resulting dataset could be sparse and dynamic to allow for extreme 
> outliers, but the granularity is still pre-determined.
> * We could also simply track a certain number of the highest latencies, and 
> basic descriptive statistics like a running average, min / max, etc. 
> Inherently more limited in what it can show us, but much simpler and might 
> still provide some insight when analyzing performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to