Steve Loughran created HADOOP-18650:
---------------------------------------
Summary: improve s3a committer stats collected
Key: HADOOP-18650
URL: https://issues.apache.org/jira/browse/HADOOP-18650
Project: Hadoop Common
Issue Type: Sub-task
Components: fs/s3
Affects Versions: 3.3.5
Reporter: Steve Loughran
we can improve stats collected in the s3a committer and saved to the JSON.
key ones
# of task manifests read; duration of loads
# size of each manifest
I think we would also benefit if we could set the commit thread pools to be big
-but then shared across all jobs (i.e. demand-created thread pool in s3a fs).
that would allow for a pool size of say, 500, but still support many jobs
actively committing at same time (busy spark driver)
finally: should file commit pool size be > size of pool of manifest readers. I
think it could be, but the ratio should be fairly low.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]