[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

Mingliang Liu (JIRA) Mon, 21 Mar 2016 13:45:12 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205086#comment-15205086
 ]


Mingliang Liu commented on HDFS-10175:
--------------------------------------

Thanks for your comment, [~andrew.wang]. I was aware of the thread local 
statistics data structure, and was in favor of following the same approach. The 
new operation map is still per-thread. The ConcurrentHashMap was used because 
when aggregating, we have to make sure the map should not be modified. It's 
functionality is similar to the "volatile" keyword for other primitive 
statistic data.

Anyway, I will revise the code and will update the patch if ConcurrentHashMap 
turns out unnecessary, for the sake of performance. Before that, the next patch 
will firstly resolve the conflicts from trunk because of [HDFS-9579].

> add per-operation stats to FileSystem.Statistics
> ------------------------------------------------
>
>                 Key: HDFS-10175
>                 URL: https://issues.apache.org/jira/browse/HDFS-10175
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Ram Venkatesh
>            Assignee: Mingliang Liu
>         Attachments: HDFS-10175.000.patch
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

Reply via email to