[
https://issues.apache.org/jira/browse/HADOOP-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511965#comment-14511965
]
Anu Engineer commented on HADOOP-11873:
---------------------------------------
I don't know if this is useful for you but HDFS does support
| `TotalWriteTime`| Total number of milliseconds spent on write operation |
| `TotalReadTime` | Total number of milliseconds spent on read operation |
| `RemoteBytesRead` | Number of bytes read by remote clients |
| `RemoteBytesWritten` | Number of bytes written by remote clients |
if you look around you should be able to see bytesRead and bytesWritten too.
Please see Metrics.md for more information. This went as part of HDFS-7773
> Include disk read/write time in FileSystem.Statistics
> -----------------------------------------------------
>
> Key: HADOOP-11873
> URL: https://issues.apache.org/jira/browse/HADOOP-11873
> Project: Hadoop Common
> Issue Type: New Feature
> Components: metrics
> Reporter: Kay Ousterhout
> Priority: Minor
>
> Measuring the time spent blocking on reading / writing data from / to disk is
> very useful for debugging performance problems in applications that read data
> from Hadoop, and can give much more information (e.g., to reflect disk
> contention) than just knowing the total amount of data read. I'd like to add
> something like "diskMillis" to FileSystem#Statistics to track this.
> For data read from HDFS, this can be done with very low overhead by adding
> logging around calls to RemoteBlockReader2.readNextPacket (because this reads
> larger chunks of data, the time added by the instrumentation is very small
> relative to the time to actually read the data). For data written to HDFS,
> this can be done in DFSOutputStream.waitAndQueueCurrentPacket.
> As far as I know, if you want this information today, it is only currently
> accessible by turning on HTrace. It looks like HTrace can't be selectively
> enabled, so a user can't just turn on the tracing on
> RemoteBlockReader2.readNextPacket for example, and instead needs to turn on
> tracing everywhere (which then introduces a bunch of overhead -- so sampling
> is necessary). It would be hugely helpful to have native metrics for time
> reading / writing to disk that are sufficiently low-overhead to be always on.
> (Please correct me if I'm wrong here about what's possible today!)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)