[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

Steve Loughran (JIRA) Sat, 23 Apr 2016 08:27:37 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255285#comment-15255285
 ]


Steve Loughran commented on HDFS-10175:
---------------------------------------

# Don't break any existing filesystem code by adding new params to existing 
methods, etc.
# add the new code out of FileSystem
# Use an int rather than an enum; lets filesystems add their own counters. I 
hereby reserve 0x200-0x255 for object store operations. 

With an open int rather than an enum, the map size is dependent upon the active 
ops, not the possible set. An initial hashmap using the int value as key should 
work, maybe set the default capacity to that of the "standard" FS ops. entry 
creation would have to be on demand. 

Alternatively, do fix the #of operations at compile time, and store in an array 
of volatile[], so per-thread storage is 4 bytes * op * thread, lookup O(1). 
With the 46 opcodes in the patch, that's 184 bytes/fs/thread. 

Here the increment operation returns the new value of -1 for either of : no 
logging, no such opcode. An out of range opcode has costs of exception raising; 
no counters is probability and penalty of speculation prediction failure.
{code}
public long inc(opcode, count) {
  try {
   return counters !=null ?  counters[opcode]+=count : 0;
  } catch(ArrayOutOfBoundsException e) {
    return -1;
  }
}
{code}

In this situation, the #of opcodes is fixed in the hadoop version; I'll just 
pre-reserve some of the values for object store operations.



> add per-operation stats to FileSystem.Statistics
> ------------------------------------------------
>
>                 Key: HDFS-10175
>                 URL: https://issues.apache.org/jira/browse/HDFS-10175
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Ram Venkatesh
>            Assignee: Mingliang Liu
>         Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

Reply via email to