[
https://issues.apache.org/jira/browse/HADOOP-17469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran reassigned HADOOP-17469:
---------------------------------------
Assignee: Mehakmeet Singh
> IOStatistics Phase II
> ---------------------
>
> Key: HADOOP-17469
> URL: https://issues.apache.org/jira/browse/HADOOP-17469
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, fs/s3
> Affects Versions: 3.3.1
> Reporter: Steve Loughran
> Assignee: Mehakmeet Singh
> Priority: Major
>
> Continue IOStatistics development with goals of
> * Easy adoption in applications
> * better instrumentation in hadoop codebase (distcp?)
> * more stats in abfs and s3a connectors
> A key has to be a thread level context for statistics so that app code
> doesn't have to explicitly ask for the stats for each worker thread. Instead
> filesystem components update the context stats as well as thread stats
> (when?) and then apps can pick up.
> * need to manage performance by minimising inefficient lookups, lock
> acquisition etc on what should be memory-only ops (read()), (write()),
> * and for duration tracking, cut down on calls to System.currentTime() so
> that only 1 should be made per operation,
> * need to propagate the context into worker threads
> Target uses
> * Impala
> * Spark via SPARK-29397
> * S3A committers
> * Iceberg.
> I have a WiP Parquet branch too, to see what can be done there. This shows up
> how the thread context is needed as its unworkable to build up your own stats
> shapshot. Even if you collect it for listX and stream reads, it doesn't
> include FS operations (e.g. rename()) and you need to rework all your methods
> to pass the stats collector around
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]