[
https://issues.apache.org/jira/browse/HBASE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927678#action_12927678
]
Gary Helmling commented on HBASE-1956:
--------------------------------------
After confusing myself yesterday, I did some testing up on EC2 with YCSB to see
if I could trigger a race condition causing the HFile and HLog counters to not
be reset. In my testing at least, either no race occurred or it wasn't
frequent enough to be noticeable. The counters were reset correctly on each
call to RegionServerMetrics.doUpdates().
However, the "*_num_ops" metrics _do_ continuously increment, but that is just
the way that MetricsTimeVaryingRate works. The number of operations is
incremented by the new value for each polling period. Same with the other
MetricsTimeVarying* classes.
In addition, the RegionServerMetrics.resetAllMinMax() method is never called by
anything in the Hadoop metrics update process (ie. MetricsContext
implementations). So the min and max values show will be for all time (though
the min/max _average_ for a polling period, not individual data points as
Nicolas points out). You can manually invoke resetAllMinMax() periodically
using JMX, but nothing in Hadoop metrics automatically will do it for you.
That's just a limitation in how it works.
So from testing everything seems to be working correctly. We already have
HBASE-3129 to address improving the min/max values. If we want to add some
configurable reset period for those, I'd suggest we do so there.
So let's close this one out.
> Export HDFS read and write latency as a metric
> ----------------------------------------------
>
> Key: HBASE-1956
> URL: https://issues.apache.org/jira/browse/HBASE-1956
> Project: HBase
> Issue Type: Improvement
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Priority: Minor
> Fix For: 0.90.0
>
> Attachments: HBASE-1956.patch, HBASE-1956.patch
>
>
> HDFS write latency spikes especially are an indicator of general cluster
> overloading. We see this where the WAL writer complains about writes taking >
> 1 second, sometimes > 4, etc. If for example the average write latency over
> the monitoring period is exported as a metric, then this can feed into
> alerting for or automatic provisioning of additional cluster hardware. While
> we're at it, export read side metrics as well.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.