[jira] [Commented] (HADOOP-11873) Include disk read/write time in FileSystem.Statistics

Anu Engineer (JIRA) Tue, 28 Apr 2015 13:36:19 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517987#comment-14517987
 ]


Anu Engineer commented on HADOOP-11873:
---------------------------------------

You can read them via http://localhost:datanodeport/jmx or via JMX APIs with 
Java. If you use http you can see that data in JSON format.

here is an example :

{code}

curl -i http://localhost:50075/jmx 
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Tue, 28 Apr 2015 20:29:38 GMT
Date: Tue, 28 Apr 2015 20:29:38 GMT
Pragma: no-cache
Expires: Tue, 28 Apr 2015 20:29:38 GMT
Date: Tue, 28 Apr 2015 20:29:38 GMT
Pragma: no-cache
Content-Type: application/json; charset=utf-8
Access-Control-Allow-Methods: GET
Access-Control-Allow-Origin: *
Connection: close
Server: Jetty(6.1.26)

{
  "beans" : [ {
    "name" : "JMImplementation:type=MBeanServerDelegate",
    "modelerType" : "javax.management.MBeanServerDelegate",
    "MBeanServerId" : "hw11767.local_1430252919240",
    "SpecificationName" : "Java Management Extensions",
    "SpecificationVersion" : "1.4",
<<snip>>
{code}

For your purpose, if you are running the computation on the same node (like 
MapReduce) the time reported by the data node should be very close to time 
spend on reading data.


> Include disk read/write time in FileSystem.Statistics
> -----------------------------------------------------
>
>                 Key: HADOOP-11873
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11873
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Kay Ousterhout
>            Priority: Minor
>
> Measuring the time spent blocking on reading / writing data from / to disk is 
> very useful for debugging performance problems in applications that read data 
> from Hadoop, and can give much more information (e.g., to reflect disk 
> contention) than just knowing the total amount of data read.  I'd like to add 
> something like "diskMillis" to FileSystem#Statistics to track this.
> For data read from HDFS, this can be done with very low overhead by adding 
> logging around calls to RemoteBlockReader2.readNextPacket (because this reads 
> larger chunks of data, the time added by the instrumentation is very small 
> relative to the time to actually read the data).  For data written to HDFS, 
> this can be done in DFSOutputStream.waitAndQueueCurrentPacket.
> As far as I know, if you want this information today, it is only currently 
> accessible by turning on HTrace. It looks like HTrace can't be selectively 
> enabled, so a user can't just turn on the tracing on 
> RemoteBlockReader2.readNextPacket for example, and instead needs to turn on 
> tracing everywhere (which then introduces a bunch of overhead -- so sampling 
> is necessary).  It would be hugely helpful to have native metrics for time 
> reading / writing to disk that are sufficiently low-overhead to be always on. 
> (Please correct me if I'm wrong here about what's possible today!)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11873) Include disk read/write time in FileSystem.Statistics

Reply via email to