Steve Loughran created HADOOP-17434:
---------------------------------------
Summary: Improve S3A upload statistics collection from
ProgressEvent callbacks
Key: HADOOP-17434
URL: https://issues.apache.org/jira/browse/HADOOP-17434
Project: Hadoop Common
Issue Type: Sub-task
Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Collection of S3A upload stats from ProgressEvent callbacks can be improved
Two similar but different implementations of listeners
* org.apache.hadoop.fs.s3a.S3ABlockOutputStream.BlockUploadProgress
* org.apache.hadoop.fs.s3a.ProgressableProgressListener. Used on simple PUT
calls.
Both call back into S3A FS to incrementWriteOperations; BlockUploadProgress
also updates S3AInstrumentation/IOStatistics.
* I'm not 100% confident that BlockUploadProgress is updating things
(especially gauges of pending bytes) at the right time
* or that completion is being handled
* And the other interface doesn't update S3AInstrumentation; numbers are lost.
* And there's no incremental updating during
{{CommitOperations.uploadFileToPendingCommit()}}, which doesn't call
Progressable.progress() other than on every block.
* or in MultipartUploader
Proposed:
* a single Progress listener which updates BlockOutputStreamStatistics, used by
all interfaces.
* WriteOperations to help set this up for callers;
* And it's uploadPart API to take a Progressable (or the progress listener to
use for uploading that part)
* Multipart upload API to also add a progressable...would help for distcp-like
applications.
+Itests to verify that the gauges come out right. At the end of each operation,
the #of bytes pending upload == 0; that of bytes uploaded == the original size
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]