[ 
https://issues.apache.org/jira/browse/HADOOP-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886455#comment-16886455
 ] 

Steve Loughran commented on HADOOP-16430:
-----------------------------------------

For the curious

h3. single file delete
{code}
2019-07-16 17:40:27,487 [JUnit-testBulkRenameAndDelete] INFO  
impl.ITestPartialRenamesDeletes (DurationInfo.java:<init>(72)) - Starting: 
Creating 10000 files
2019-07-16 17:42:02,633 [JUnit-testBulkRenameAndDelete] INFO  
impl.ITestPartialRenamesDeletes (DurationInfo.java:close(87)) - Creating 10000 
files: duration 1:35.146s
2019-07-16 17:45:01,905 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles (DurationInfo.java:<init>(72)) - Starting: Rename 
s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/src to 
s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/final
2019-07-16 17:54:40,158 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles (DurationInfo.java:close(87)) - Rename 
s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/src to 
s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/final: duration 
9:38.253s
2019-07-16 17:54:40,162 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles 
(ITestS3ADeleteManyFiles.java:testBulkRenameAndDelete(78)) - Effective rename 
bandwidth 0.000147 MB/s
2019-07-16 17:58:17,999 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles (DurationInfo.java:<init>(72)) - Starting: Delete 
subtree s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/final
2019-07-16 18:04:49,074 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles (DurationInfo.java:close(87)) - Delete subtree 
s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/final: duration 
6:31.075s
2019-07-16 18:04:49,074 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles 
(ITestS3ADeleteManyFiles.java:testBulkRenameAndDelete(101)) - Nanoseconds per 
object deleted 39108 microseconds
2019-07-16 18:04:49,684 [teardown] INFO  contract.AbstractFSContractTestBase 
(AbstractFSContractTestBase.java:describe(255)) - closing file system
{code}

h3. Multifile delete
{code}

2019-07-16 18:04:50,625 [JUnit-testBulkRenameAndDelete] INFO  
impl.ITestPartialRenamesDeletes (DurationInfo.java:<init>(72)) - Starting: 
Creating 10000 files
2019-07-16 18:06:33,128 [JUnit-testBulkRenameAndDelete] INFO  
impl.ITestPartialRenamesDeletes (DurationInfo.java:close(87)) - Creating 10000 
files: duration 1:42.503s
2019-07-16 18:09:35,521 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles (DurationInfo.java:<init>(72)) - Starting: Rename 
s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/src to 
s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/final
2019-07-16 18:13:12,657 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles (DurationInfo.java:close(87)) - Rename 
s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/src to 
s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/final: duration 
3:37.136s
2019-07-16 18:13:12,658 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles 
(ITestS3ADeleteManyFiles.java:testBulkRenameAndDelete(78)) - Effective rename 
bandwidth 0.000390 MB/s
2019-07-16 18:17:09,695 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles (DurationInfo.java:<init>(72)) - Starting: Delete 
subtree s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/final
2019-07-16 18:17:42,863 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles (DurationInfo.java:close(87)) - Delete subtree 
s3a://hwdev-steve-ireland-new/test/testBulkRenameAndDelete/final: duration 
0:33.168s
2019-07-16 18:17:42,864 [JUnit-testBulkRenameAndDelete] INFO  
scale.ITestS3ADeleteManyFiles 
(ITestS3ADeleteManyFiles.java:testBulkRenameAndDelete(101)) - Nanoseconds per 
object deleted 3316 microseconds
2019-07-16 18:17:43,588 [teardown] INFO  contract.AbstractFSContractTestBase 
(AbstractFSContractTestBase.java:describe(255)) - closing file system
{code}

The bulk rename took 9:38 on single delete, 3:37 on multidelete, ~3x longer. 

On the delete alone, multifile delete: 0:33. Single delete, 6:31 -over 10x 
slower.

At the same time *only* 10x slower, even though we are now pushing out 1000 
entries to a delete in a single file.

> S3AFilesystem.delete to incrementally update s3guard with deletions
> -------------------------------------------------------------------
>
>                 Key: HADOOP-16430
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16430
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> Currently S3AFilesystem.delete() only updates the delete at the end of a 
> paged delete operation. This makes it slow when there are many thousands of 
> files to delete ,and increases the window of vulnerability to failures
> Preferred
> * after every bulk DELETE call is issued to S3, queue the (async) delete of 
> all entries in that post.
> * at the end of the delete, await the completion of these operations.
> * inside S3AFS, also do the delete across threads, so that different HTTPS 
> connections can be used.
> This should maximise DDB throughput against tables which aren't IO limited.
> When executed against small IOP limited tables, the parallel DDB DELETE 
> batches will trigger a lot of throttling events; we should make sure these 
> aren't going to trigger failures



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to