[
https://issues.apache.org/jira/browse/HBASE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399486#comment-13399486
]
Andrew Wang commented on HBASE-5786:
------------------------------------
I don't think you can assume a normal distribution for latency. I think it
looks more Zipfian in practice, or maybe bi-modal because of cache misses.
Also, a 5% error on a 95th percentile is kind of huge; IIUC, that means it's
actually reporting between the 90th and 100th percentile. [1] by the same
authors as your link discusses sampling for high-percentiles.
I found [2] which I think is well-suited for our use case, since it can do
approximate quantiles on a sliding time window. Space and time bounds seems to
be O(reasonable log factors). Somehow mashing up [2] to use [1] would be most
optimal, but doing just [2] is probably okay too.
[1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
[2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf
> Implement histogram metrics for flush and compaction latencies and sizes.
> -------------------------------------------------------------------------
>
> Key: HBASE-5786
> URL: https://issues.apache.org/jira/browse/HBASE-5786
> Project: HBase
> Issue Type: New Feature
> Components: metrics, regionserver
> Affects Versions: 0.92.2, 0.94.0, 0.96.0
> Reporter: Jonathan Hsieh
>
> Average time for region operations doesn't really tell a useful story when
> that help diagnose anomalous conditions.
> It would be extremely useful to add histogramming metrics similar to
> HBASE-5533 for region operations like flush, compaction and splitting. The
> probably should be forward biased at a much coarser granularity however
> (maybe decay every day?)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira