[ 
https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261118#comment-15261118
 ] 

Colin Patrick McCabe commented on HADOOP-13065:
-----------------------------------------------

Thanks, [~liuml07].

Based on the discussion today, it sounds like we would like to have both global 
statistics per FS class, and per-instance statistics for an individual FS or FC 
instance.  The rationale for this is that in some cases we might want to 
differentiate between, say, the stats when talking to one s3 bucket, and 
another s3 bucket.  Or another example is the stats talking to one HDFS FS 
versus another HDFS FS (if we are using federation, or just multiple HDFS 
instances).

We talked a bit about metrics2, but there were several things that made it not 
a good fit for this statistics interface.  One issue is that metrics2 assumes 
that statistics are permanent once created.  Effectively, it keeps them around 
until the JVM terminates.  metrics2 also tends to use a fair amount of memory 
and require a fair amount of boilerplate code compared to other solutions.  
Finally, because it is global, it can't do per-instance stats very effectively.

It would be nice for the new statistics interface to provide the same stats 
which are currently provided by FileSystem#Statistics.  This would allow us to 
deprecate and eventually remove FileSystem#Statistics as a public interface 
(although we might keep the implementation).  This could be done only in a new 
release of Hadoop, of course.  We also talked about the benefits of providing 
an iterator over all statistics rather than a map of all statistics.  
Relatedly, we talked about the desire to have a new interface that was abstract 
enough to accommodate new, more efficient implementations in the future.

For now, the new interface will deal with per-FS stats, but not per-stream 
ones.  We should revisit per-stream statistics later.

> add per-operation stats to FileSystem.Statistics
> ------------------------------------------------
>
>                 Key: HADOOP-13065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13065
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Ram Venkatesh
>            Assignee: Mingliang Liu
>         Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to