[jira] [Commented] (HADOOP-11873) Include disk read/write time in FileSystem.Statistics

Anu Engineer (JIRA) Fri, 24 Apr 2015 16:03:12 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511965#comment-14511965
 ]


Anu Engineer commented on HADOOP-11873:
---------------------------------------

I don't know if this is useful for you but HDFS does support

| `TotalWriteTime`| Total number of milliseconds spent on write operation |
| `TotalReadTime` | Total number of milliseconds spent on read operation |
| `RemoteBytesRead` | Number of bytes read by remote clients |
| `RemoteBytesWritten` | Number of bytes written by remote clients |

if you look around you should be able to see bytesRead and bytesWritten too. 
Please see Metrics.md for more information. This went as part of HDFS-7773



> Include disk read/write time in FileSystem.Statistics
> -----------------------------------------------------
>
>                 Key: HADOOP-11873
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11873
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Kay Ousterhout
>            Priority: Minor
>
> Measuring the time spent blocking on reading / writing data from / to disk is 
> very useful for debugging performance problems in applications that read data 
> from Hadoop, and can give much more information (e.g., to reflect disk 
> contention) than just knowing the total amount of data read.  I'd like to add 
> something like "diskMillis" to FileSystem#Statistics to track this.
> For data read from HDFS, this can be done with very low overhead by adding 
> logging around calls to RemoteBlockReader2.readNextPacket (because this reads 
> larger chunks of data, the time added by the instrumentation is very small 
> relative to the time to actually read the data).  For data written to HDFS, 
> this can be done in DFSOutputStream.waitAndQueueCurrentPacket.
> As far as I know, if you want this information today, it is only currently 
> accessible by turning on HTrace. It looks like HTrace can't be selectively 
> enabled, so a user can't just turn on the tracing on 
> RemoteBlockReader2.readNextPacket for example, and instead needs to turn on 
> tracing everywhere (which then introduces a bunch of overhead -- so sampling 
> is necessary).  It would be hugely helpful to have native metrics for time 
> reading / writing to disk that are sufficiently low-overhead to be always on. 
> (Please correct me if I'm wrong here about what's possible today!)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11873) Include disk read/write time in FileSystem.Statistics

Reply via email to