[jira] Commented: (HBASE-1956) Export HDFS read and write latency as a metric

Gary Helmling (JIRA) Tue, 02 Nov 2010 16:40:49 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927678#action_12927678
 ]


Gary Helmling commented on HBASE-1956:
--------------------------------------

After confusing myself yesterday, I did some testing up on EC2 with YCSB to see 
if I could trigger a race condition causing the HFile and HLog counters to not 
be reset.  In my testing at least, either no race occurred or it wasn't 
frequent enough to be noticeable.  The counters were reset correctly on each 
call to RegionServerMetrics.doUpdates().

However, the "*_num_ops" metrics _do_ continuously increment, but that is just 
the way that MetricsTimeVaryingRate works.  The number of operations is 
incremented by the new value for each polling period.  Same with the other 
MetricsTimeVarying* classes.

In addition, the RegionServerMetrics.resetAllMinMax() method is never called by 
anything in the Hadoop metrics update process (ie. MetricsContext 
implementations).  So the min and max values show will be for all time (though 
the min/max _average_ for a polling period, not individual data points as 
Nicolas points out).  You can manually invoke resetAllMinMax() periodically 
using JMX, but nothing in Hadoop metrics automatically will do it for you.  
That's just a limitation in how it works.

So from testing everything seems to be working correctly.  We already have 
HBASE-3129 to address improving the min/max values.  If we want to add some 
configurable reset period for those, I'd suggest we do so there.

So let's close this one out.

> Export HDFS read and write latency as a metric
> ----------------------------------------------
>
>                 Key: HBASE-1956
>                 URL: https://issues.apache.org/jira/browse/HBASE-1956
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>             Fix For: 0.90.0
>
>         Attachments: HBASE-1956.patch, HBASE-1956.patch
>
>
> HDFS write latency spikes especially are an indicator of general cluster 
> overloading. We see this where the WAL writer complains about writes taking > 
> 1 second, sometimes > 4, etc.  If for example the average write latency over 
> the monitoring period is exported as a metric, then this can feed into 
> alerting for or automatic provisioning of additional cluster hardware. While 
> we're at it, export read side metrics as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1956) Export HDFS read and write latency as a metric

Reply via email to