[
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399454#comment-13399454
]
Elliott Clark commented on HBASE-5786:
--------------------------------------
The library we use takes time decaying samples in buckets. So yes we lose some
accuracy in the higher percentiles if that extreme data was a long time ago.
However for newer data we are more accurate; if the spread of times stay
constant then we'll be very accurate. If our data was normally distributed we
would have less than 5% error (at least from my understanding of
http://www.research.att.com/people/Cormode_Graham/library/publications/CormodeShkapenyukSrivastavaXu09.pdf)
on the all of the measures. For me 5% error upper bound on a metric seems
good enough. All of the other methods that I looked at take a lot longer to
compute, and so I don't think they are worth it.
> Implement histogram metrics for flush and compaction latencies and sizes.
> -------------------------------------------------------------------------
>
> Key: HBASE-5786
> URL: https://issues.apache.org/jira/browse/HBASE-5786
> Project: HBase
> Issue Type: New Feature
> Components: metrics, regionserver
> Affects Versions: 0.92.2, 0.94.0, 0.96.0
> Reporter: Jonathan Hsieh
>
> Average time for region operations doesn't really tell a useful story when
> that help diagnose anomalous conditions.
> It would be extremely useful to add histogramming metrics similar to
> HBASE-5533 for region operations like flush, compaction and splitting. The
> probably should be forward biased at a much coarser granularity however
> (maybe decay every day?)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira