[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256891#comment-15256891
 ] 

Steve Loughran commented on HDFS-10175:
---------------------------------------

yes, could have a talk. I actually thinks this is both related to and different 
from the s3a stuff... that is, it can go in without this, as it is currently 
more on some overall metrics and some very low level counters for testing 
stream behaviour (rather than for direct reporting, though that could come)

# some JMX-able stats for each s3a instance (taken from Azure, not per-thread)
# per-stream stats (volatile, merged with s3a stats above when stream is 
close()d
# tests to verify behaviour of the FS (e.g. lazy seek doesn't open files, 
read-ahead skips bytes, etc)

I agree about consistency; for the stream ones I've done the fields are tagged 
as volatile. These are merged into the JMX-layer instrumentation at close time, 
so sync costs are low


> add per-operation stats to FileSystem.Statistics
> ------------------------------------------------
>
>                 Key: HDFS-10175
>                 URL: https://issues.apache.org/jira/browse/HDFS-10175
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Ram Venkatesh
>            Assignee: Mingliang Liu
>         Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to