[
https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Jacobs updated HDFS-3170:
---------------------------------
Status: Patch Available (was: Open)
The attached patch adds the write-latency related metrics described in this
JIRA. The tests verify that the metrics are added. I manually checked that the
averaged latency values were reasonable. For example, I added a sleep before
taking the ack end time and then verified that the resulting metric (via jmx)
was greater than the sleep time.
> Add more useful metrics for write latency
> -----------------------------------------
>
> Key: HDFS-3170
> URL: https://issues.apache.org/jira/browse/HDFS-3170
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Affects Versions: 2.0.0-alpha
> Reporter: Todd Lipcon
> Assignee: Matthew Jacobs
> Attachments: hdfs-3170.txt
>
>
> Currently, the only write-latency related metric we expose is the total
> amount of time taken by opWriteBlock. This is practically useless, since (a)
> different blocks may be wildly different sizes, and (b) if the writer is only
> generating data slowly, it will make a block write take longer by no fault of
> the DN. I would like to propose two new metrics:
> 1) *flush-to-disk time*: count how long it takes for each call to flush an
> incoming packet to disk (including the checksums). In most cases this will be
> close to 0, as it only flushes to buffer cache, but if the backing block
> device enters congested writeback, it can take much longer, which provides an
> interesting metric.
> 2) *round trip to downstream pipeline node*: track the round trip latency for
> the part of the pipeline between the local node and its downstream neighbors.
> When we add a new packet to the ack queue, save the current timestamp. When
> we receive an ack, update the metric based on how long since we sent the
> original packet. This gives a metric of the total RTT through the pipeline.
> If we also include this metric in the ack to upstream, we can subtract the
> amount of time due to the later stages in the pipeline and have an accurate
> count of this particular link.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira