[ 
https://issues.apache.org/jira/browse/HADOOP-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393498#comment-15393498
 ] 

Rajesh Balamohan commented on HADOOP-13164:
-------------------------------------------

Thanks for the inputs Steve. I tried out a deleteObjects() instead of 
listStatus + deleteObject on the same call. Perf in standalone mode seems 
impressive as it cuts down recursive calls. 

{noformat}

Test rename of files 10 times : Without Patch:
2016-07-26 11:48:31,408 [Thread-0] INFO  contract.ContractTestUtils 
(ContractTestUtils.java:end(1365)) - Duration of Time to execute 10 renames 
from /tests3a/src to /tests3a/dst: 75,397,648,000 nS
2016-07-26 11:48:31,409 [Thread-0] INFO  scale.TestS3ADirectoryPerformance 
(TestS3ADirectoryPerformance.java:testTimeToRenameFiles(177)) - Time per call: 
7,539,764,800
2016-07-26 11:50:58,555 [Thread-0] INFO  contract.ContractTestUtils 
(ContractTestUtils.java:end(1365)) - Duration of Time to execute 10 renames 
from /tests3a/src/1/2/3/4/5 to /tests3a/dst/1/2/3/4/5: 124,240,450,000 nS
2016-07-26 11:50:58,555 [Thread-0] INFO  scale.TestS3ADirectoryPerformance 
(TestS3ADirectoryPerformance.java:testTimeToRenameFiles(177)) - Time per call: 
12,424,045,000


Test rename of files 10 times : With deleteObjects request patch:
2016-07-26 11:58:17,544 [Thread-0] INFO  contract.ContractTestUtils 
(ContractTestUtils.java:end(1365)) - Duration of Time to execute 10 renames 
from /tests3a/src to /tests3a/dst: 69,002,710,000 nS
2016-07-26 11:58:17,545 [Thread-0] INFO  scale.TestS3ADirectoryPerformance 
(TestS3ADirectoryPerformance.java:testTimeToRenameFiles(177)) - Time per call: 
6,900,271,000
2016-07-26 11:59:45,022 [Thread-0] INFO  contract.ContractTestUtils 
(ContractTestUtils.java:end(1365)) - Duration of Time to execute 10 renames 
from /tests3a/src/1/2/3/4/5 to /tests3a/dst/1/2/3/4/5: 73,648,115,000 nS
2016-07-26 11:59:45,022 [Thread-0] INFO  scale.TestS3ADirectoryPerformance 
(TestS3ADirectoryPerformance.java:testTimeToRenameFiles(177)) - Time per call: 
7,364,811,500
{noformat}

I will upload a patch with the deleteObjects() from 
deleteUnnecessaryFakeDirectories. 

> Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories
> --------------------------------------------------------
>
>                 Key: HADOOP-13164
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13164
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HADOOP-13164.branch-2.WIP.patch
>
>
> https://github.com/apache/hadoop/blob/27c4e90efce04e1b1302f668b5eb22412e00d033/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1224
> deleteUnnecessaryFakeDirectories is invoked in S3AFileSystem during rename 
> and on outputstream close() to purge any fake directories. Depending on the 
> nesting in the folder structure, it might take a lot longer time as it 
> invokes getFileStatus multiple times.  Instead, it should be able to break 
> out of the loop once a non-empty directory is encountered. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to