[jira] [Updated] (HADOOP-17469) IOStatistics Phase II

Steve Loughran (Jira) Thu, 14 Jan 2021 06:12:05 -0800


     [ 
https://issues.apache.org/jira/browse/HADOOP-17469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-17469:
------------------------------------
    Description: 
Continue IOStatistics development with goals of

* Easy adoption in applications
* better instrumentation in hadoop codebase (distcp?)
* more stats in abfs and s3a connectors

A key has to be a thread level context for statistics so that app code doesn't 
have to explicitly ask for the stats for each worker thread. Instead 

filesystem components update the context stats as well as thread stats (when?) 
and then apps can pick up.

* need to manage performance by minimising inefficient lookups, lock 
acquisition etc on what should be memory-only ops (read()), (write()),
* and for duration tracking, cut down on calls to System.currentTime() so that 
only 1 should be made per operation, 
* need to propagate the context into worker threads

Target uses

* Impala 
* Spark via SPARK-29397 
* S3A committers
* Iceberg.

I have a WiP Parquet branch too, to see what can be done there. This shows up 
how the thread context is needed as its unworkable to build up your own stats 
shapshot. Even if you collect it for listX and stream reads, it doesn't include 
FS operations (e.g. rename()) and you need to rework all your methods to pass 
the stats collector around


  was:
Continue IOStatistics development with goals of

* Easy adoption in applications
* better instrumentation in hadoop codebase (distcp?)
* more stats in abfs and s3a connectors

A key has to be a thread level context for statistics so that app code doesn't 
have to explicitly ask for the stats for each worker thread. Instead 

filesystem components update the context stats as well as thread stats (when?) 
and then apps can pick up.

* need to manage performance by minimising inefficient lookups, lock 
acquisition etc on what should be memory-only ops (read()), (write()),
* and for duration tracking, cut down on calls to System.currentTime() so that 
only 1 should be made per operation, 
* need to propagate the context into worker threads



> IOStatistics Phase II
> ---------------------
>
>                 Key: HADOOP-17469
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17469
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, fs/s3
>    Affects Versions: 3.3.1
>            Reporter: Steve Loughran
>            Priority: Major
>
> Continue IOStatistics development with goals of
> * Easy adoption in applications
> * better instrumentation in hadoop codebase (distcp?)
> * more stats in abfs and s3a connectors
> A key has to be a thread level context for statistics so that app code 
> doesn't have to explicitly ask for the stats for each worker thread. Instead 
> filesystem components update the context stats as well as thread stats 
> (when?) and then apps can pick up.
> * need to manage performance by minimising inefficient lookups, lock 
> acquisition etc on what should be memory-only ops (read()), (write()),
> * and for duration tracking, cut down on calls to System.currentTime() so 
> that only 1 should be made per operation, 
> * need to propagate the context into worker threads
> Target uses
> * Impala 
> * Spark via SPARK-29397 
> * S3A committers
> * Iceberg.
> I have a WiP Parquet branch too, to see what can be done there. This shows up 
> how the thread context is needed as its unworkable to build up your own stats 
> shapshot. Even if you collect it for listX and stream reads, it doesn't 
> include FS operations (e.g. rename()) and you need to rework all your methods 
> to pass the stats collector around



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-17469) IOStatistics Phase II

Reply via email to