[
https://issues.apache.org/jira/browse/HADOOP-18948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779567#comment-17779567
]
ASF GitHub Bot commented on HADOOP-18948:
-----------------------------------------
steveloughran commented on code in PR #6218:
URL: https://github.com/apache/hadoop/pull/6218#discussion_r1372034162
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Statistic.java:
##########
@@ -242,7 +242,10 @@ public enum Statistic {
StoreStatisticNames.OBJECT_MULTIPART_UPLOAD_ABORTED,
"Object multipart upload aborted",
TYPE_DURATION),
- OBJECT_PUT_REQUESTS(
+ OBJECT_MULTIPART_UPLOAD_LIST(
+ StoreStatisticNames.OBJECT_MULTIPART_UPLOAD_LIST,
+ "Object multipart upload aborted",
Review Comment:
ooh, well spotted!
> S3A. Add option fs.s3a.directory.operations.purge.uploads to purge on
> rename/delete
> -----------------------------------------------------------------------------------
>
> Key: HADOOP-18948
> URL: https://issues.apache.org/jira/browse/HADOOP-18948
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.4.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Minor
> Labels: pull-request-available
>
> On third-party stores without lifecycle rules its possible to accrue many GB
> of pending multipart uploads, including from
> * magic committer jobs where spark driver/MR AM failed before commit/abort
> * distcp jobs which timeout and get aborted
> * any client code writing datasets which are interrupted before close.
> Although there's a purge pending uploads option, that's dangerous because if
> any fs is instantiated with it, it can destroy in-flight work
> otherwise, the "hadoop s3guard uploads" command does work but needs
> scheduling/manual execution
> proposed: add a new property {{fs.s3a.directory.operations.purge.uploads}}
> which will automatically cancel all pending uploads under a path
> * delete: everything under the dir
> * rename: all under the source dir
> This will be done in parallel to the normal operation, but no attempt to post
> abortMultipartUploads in different threads. The assumption here is that this
> is rare. And it'll be off by default as in AWS people should have rules for
> these things.
> + doc (third_party?)
> + add new counter/metric for abort operations, count and duration
> + test to include cost assertions
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]