[ 
https://issues.apache.org/jira/browse/HADOOP-18948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778825#comment-17778825
 ] 

ASF GitHub Bot commented on HADOOP-18948:
-----------------------------------------

steveloughran opened a new pull request, #6218:
URL: https://github.com/apache/hadoop/pull/6218

   
   Delete and rename get to optionally purge in-flight uploads.
   
   no test; just a fix for those itests which failed once the production code 
adds the rule that the path to delete must always end in a / That's needed to 
avoid deleting uploads in adjacent directories.
   
   
   ### How was this patch tested?
   
   
   s3 london; fixed some s3guard tests which were trying to purge uploads on a 
path, not a dir.
   
   I do plan to write a new test
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [X] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> S3A. Add option fs.s3a.directory.operations.purge.uploads to do that on 
> rename/delete
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18948
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18948
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>
> On third-party stores without lifecycle rules its possible to accrue many GB 
> of pending multipart uploads, including from
> * magic committer jobs where spark driver/MR AM failed before commit/abort
> * distcp jobs which timeout and get aborted
> * any client code writing datasets which are interrupted before close.
> Although there's a purge pending uploads option, that's dangerous because if 
> any fs is instantiated with it, it can destroy in-flight work
> otherwise, the "hadoop s3guard uploads" command does work but needs 
> scheduling/manual execution
> proposed: add a new property {{fs.s3a.directory.operations.purge.uploads}} 
> which will automatically cancel all pending uploads under a path
> * delete: everything under the dir
> * rename: all under the source dir
> This will be done in parallel to the normal operation, but no attempt to post 
> abortMultipartUploads in different threads. The assumption here is that this 
> is rare. And it'll be off by default as in AWS people should have rules for 
> these things.
> + doc (third_party?)
> + add new counter/metric for abort operations, count and duration
> + test to include cost assertions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to