[ 
https://issues.apache.org/jira/browse/HADOOP-13305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HADOOP-13305:
-----------------------------------
    Description: 
The {{StorageStatistics}} provides a pretty general interface, i.e. 
{{getLong(name)}} and {{getLongStatistics()}}. There is no shared or standard 
names for the storage statistics and thus the getLong(name) is up to the 
implementation of storage statistics. The problems:
# For the common statistics, downstream applications expect the same statistics 
name across different storage statistics and/or file system schemes. Chances 
are they have to use {{DFSOpsCountStorageStatistics#getLong(“getStatus”)}} and 
{{S3A.Statistics#getLong(“get_status”)}} for retrieving the getStatus operation 
stat.
# Moreover, probing per-operation stats is hard if there is no standard/shared 
common names.

It makes a lot of sense for different schemes to issue the per-operation stats 
of the same name. Meanwhile, every FS will have its own internal things to 
count, which can't be centrally defined or managed. But there are some common 
which would be easier managed if they all had the same name.

Another motivation is that having a common set of names here will encourage 
uniform instrumentation of all filesystems; it will also make it easier to 
analyze the output of runs, were the stats to be published to a "performance 
log" similar to the audit log. See Steve's work for S3  (e.g. [HADOOP-13171])

This jira is track the effort of defining common StorageStatistics entry names. 
Thanks to [~cmccabe], [[email protected]], [~hitesh] and [~jnp] for offline 
discussion.

  was:
The {{StorageStatistics}} provides a pretty general interface, i.e. 
{{getLong(name)}} and {{getLongStatistics()}}. There is no shared or standard 
names for the storage statistics and thus the getLong(name) is up to the 
implementation of storage statistics. The problems:
# For the common statistics, downstream applications expect the same statistics 
name across different storage statistics and/or file system schemes. Chances 
are they have to use {{DFSOpsCountStorageStatistics#getLong(“getStatus”)}} and 
{{S3A.Statistics#getLong(“get_status”)}} for retrieving the getStatus operation 
stat.
# Moreover, probing per-operation stats is hard if there is no standard/shared 
common names.

It makes a lot of sense for different schemes to issue the per-operation stats 
of the same name. Meanwhile, every FS will have its own internal things to 
count, which can't be centrally defined or managed. But there are some common 
which would be easier managed if they all had the same name.

Another motivation is that having a common set of names here will encourage 
uniform instrumentation of all filesystems; it will also make it easier to 
analyze the output of runs, were the stats to be published to a "performance 
log" similar to the audit log. See Steve's work for S3  (e.g. [HADOOP-13171])

This jira is track the effort of defining common StorageStatistics entry names.


> Define common statistics names across schemes
> ---------------------------------------------
>
>                 Key: HADOOP-13305
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13305
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.8.0
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>             Fix For: 2.8.0
>
>
> The {{StorageStatistics}} provides a pretty general interface, i.e. 
> {{getLong(name)}} and {{getLongStatistics()}}. There is no shared or standard 
> names for the storage statistics and thus the getLong(name) is up to the 
> implementation of storage statistics. The problems:
> # For the common statistics, downstream applications expect the same 
> statistics name across different storage statistics and/or file system 
> schemes. Chances are they have to use 
> {{DFSOpsCountStorageStatistics#getLong(“getStatus”)}} and 
> {{S3A.Statistics#getLong(“get_status”)}} for retrieving the getStatus 
> operation stat.
> # Moreover, probing per-operation stats is hard if there is no 
> standard/shared common names.
> It makes a lot of sense for different schemes to issue the per-operation 
> stats of the same name. Meanwhile, every FS will have its own internal things 
> to count, which can't be centrally defined or managed. But there are some 
> common which would be easier managed if they all had the same name.
> Another motivation is that having a common set of names here will encourage 
> uniform instrumentation of all filesystems; it will also make it easier to 
> analyze the output of runs, were the stats to be published to a "performance 
> log" similar to the audit log. See Steve's work for S3  (e.g. [HADOOP-13171])
> This jira is track the effort of defining common StorageStatistics entry 
> names. Thanks to [~cmccabe], [[email protected]], [~hitesh] and [~jnp] for 
> offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to