[ 
https://issues.apache.org/jira/browse/HADOOP-16830?focusedWorklogId=493047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-493047
 ]

ASF GitHub Bot logged work on HADOOP-16830:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Sep/20 16:12
            Start Date: 30/Sep/20 16:12
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on pull request #2323:
URL: https://github.com/apache/hadoop/pull/2323#issuecomment-701491715


   > error. This is solved when I add ".mean" at the end of the DurationTracker 
stat name. The same with Max(".max" has to be added) and Min(".min" has to be 
added) to fetch their values.
   > Was thinking if we could add it to the getter rather than having to append 
with stat name.
   
   I'm a bit reluctant to, as then people would ask about failures next. What 
could be handy would be something in the support class to get all stats for a 
duration (or null), with some struct to contain them all. e.g
   
   `fetchDurationStatistics(IOStatistics, key) -> {count, min, max ,mean, 
failed, failed.min, failed.max, failed.mean)`, and you'd then work off that. 
Seem good? 
   Troublespot: what if only some of the values were found? they'd be null in 
the result. Maybe we'd let you ask for the success/failure stats separately, 
deal with it that way, and have some `boolean isComplete()` probe to check all 
are set


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 493047)
    Time Spent: 6h 20m  (was: 6h 10m)

> Add public IOStatistics API
> ---------------------------
>
>                 Key: HADOOP-16830
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16830
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Applications like to collect the statistics which specific operations take, 
> by collecting exactly those operations done during the execution of FS API 
> calls by their individual worker threads, and returning these to their job 
> driver
> * S3A has a statistics API for some streams, but it's a non-standard one; 
> Impala &c can't use it
> * FileSystem storage statistics are public, but as they aren't cross-thread, 
> they don't aggregate properly
> Proposed
> # A new IOStatistics interface to serve up statistics
> # S3A to implement
> # other stores to follow
> # Pass-through from the usual wrapper classes (FS data input/output streams)
> It's hard to think about how best to offer an API for operation context 
> stats, and how to actually implement.
> ThreadLocal isn't enough because the helper threads need to update on the 
> thread local value of the instigator
> My Initial PoC doesn't address that issue, but it shows what I'm thinking of



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to