[
https://issues.apache.org/jira/browse/HADOOP-17469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-17469:
------------------------------------
Description:
Continue IOStatistics development with goals of
* Easy adoption in applications
* better instrumentation in hadoop codebase (distcp?)
* more stats in abfs and s3a connectors
A key has to be a thread level context for statistics so that app code doesn't
have to explicitly ask for the stats for each worker thread. Instead
filesystem components update the context stats as well as thread stats (when?)
and then apps can pick up.
* need to manage performance by minimising inefficient lookups, lock
acquisition etc on what should be memory-only ops (read()), (write()),
* and for duration tracking, cut down on calls to System.currentTime() so that
only 1 should be made per operation,
* need to propagate the context into worker threads
Target uses
* Impala
* Spark via SPARK-29397
* S3A committers
* Iceberg.
I have a WiP Parquet branch too, to see what can be done there. This shows up
how the thread context is needed as its unworkable to build up your own stats
shapshot. Even if you collect it for listX and stream reads, it doesn't include
FS operations (e.g. rename()) and you need to rework all your methods to pass
the stats collector around
was:
Continue IOStatistics development with goals of
* Easy adoption in applications
* better instrumentation in hadoop codebase (distcp?)
* more stats in abfs and s3a connectors
A key has to be a thread level context for statistics so that app code doesn't
have to explicitly ask for the stats for each worker thread. Instead
filesystem components update the context stats as well as thread stats (when?)
and then apps can pick up.
* need to manage performance by minimising inefficient lookups, lock
acquisition etc on what should be memory-only ops (read()), (write()),
* and for duration tracking, cut down on calls to System.currentTime() so that
only 1 should be made per operation,
* need to propagate the context into worker threads
> IOStatistics Phase II
> ---------------------
>
> Key: HADOOP-17469
> URL: https://issues.apache.org/jira/browse/HADOOP-17469
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, fs/s3
> Affects Versions: 3.3.1
> Reporter: Steve Loughran
> Priority: Major
>
> Continue IOStatistics development with goals of
> * Easy adoption in applications
> * better instrumentation in hadoop codebase (distcp?)
> * more stats in abfs and s3a connectors
> A key has to be a thread level context for statistics so that app code
> doesn't have to explicitly ask for the stats for each worker thread. Instead
> filesystem components update the context stats as well as thread stats
> (when?) and then apps can pick up.
> * need to manage performance by minimising inefficient lookups, lock
> acquisition etc on what should be memory-only ops (read()), (write()),
> * and for duration tracking, cut down on calls to System.currentTime() so
> that only 1 should be made per operation,
> * need to propagate the context into worker threads
> Target uses
> * Impala
> * Spark via SPARK-29397
> * S3A committers
> * Iceberg.
> I have a WiP Parquet branch too, to see what can be done there. This shows up
> how the thread context is needed as its unworkable to build up your own stats
> shapshot. Even if you collect it for listX and stream reads, it doesn't
> include FS operations (e.g. rename()) and you need to rework all your methods
> to pass the stats collector around
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]