[
https://issues.apache.org/jira/browse/HADOOP-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393498#comment-15393498
]
Rajesh Balamohan commented on HADOOP-13164:
-------------------------------------------
Thanks for the inputs Steve. I tried out a deleteObjects() instead of
listStatus + deleteObject on the same call. Perf in standalone mode seems
impressive as it cuts down recursive calls.
{noformat}
Test rename of files 10 times : Without Patch:
2016-07-26 11:48:31,408 [Thread-0] INFO contract.ContractTestUtils
(ContractTestUtils.java:end(1365)) - Duration of Time to execute 10 renames
from /tests3a/src to /tests3a/dst: 75,397,648,000 nS
2016-07-26 11:48:31,409 [Thread-0] INFO scale.TestS3ADirectoryPerformance
(TestS3ADirectoryPerformance.java:testTimeToRenameFiles(177)) - Time per call:
7,539,764,800
2016-07-26 11:50:58,555 [Thread-0] INFO contract.ContractTestUtils
(ContractTestUtils.java:end(1365)) - Duration of Time to execute 10 renames
from /tests3a/src/1/2/3/4/5 to /tests3a/dst/1/2/3/4/5: 124,240,450,000 nS
2016-07-26 11:50:58,555 [Thread-0] INFO scale.TestS3ADirectoryPerformance
(TestS3ADirectoryPerformance.java:testTimeToRenameFiles(177)) - Time per call:
12,424,045,000
Test rename of files 10 times : With deleteObjects request patch:
2016-07-26 11:58:17,544 [Thread-0] INFO contract.ContractTestUtils
(ContractTestUtils.java:end(1365)) - Duration of Time to execute 10 renames
from /tests3a/src to /tests3a/dst: 69,002,710,000 nS
2016-07-26 11:58:17,545 [Thread-0] INFO scale.TestS3ADirectoryPerformance
(TestS3ADirectoryPerformance.java:testTimeToRenameFiles(177)) - Time per call:
6,900,271,000
2016-07-26 11:59:45,022 [Thread-0] INFO contract.ContractTestUtils
(ContractTestUtils.java:end(1365)) - Duration of Time to execute 10 renames
from /tests3a/src/1/2/3/4/5 to /tests3a/dst/1/2/3/4/5: 73,648,115,000 nS
2016-07-26 11:59:45,022 [Thread-0] INFO scale.TestS3ADirectoryPerformance
(TestS3ADirectoryPerformance.java:testTimeToRenameFiles(177)) - Time per call:
7,364,811,500
{noformat}
I will upload a patch with the deleteObjects() from
deleteUnnecessaryFakeDirectories.
> Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories
> --------------------------------------------------------
>
> Key: HADOOP-13164
> URL: https://issues.apache.org/jira/browse/HADOOP-13164
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Rajesh Balamohan
> Priority: Minor
> Attachments: HADOOP-13164.branch-2.WIP.patch
>
>
> https://github.com/apache/hadoop/blob/27c4e90efce04e1b1302f668b5eb22412e00d033/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1224
> deleteUnnecessaryFakeDirectories is invoked in S3AFileSystem during rename
> and on outputstream close() to purge any fake directories. Depending on the
> nesting in the folder structure, it might take a lot longer time as it
> invokes getFileStatus multiple times. Instead, it should be able to break
> out of the loop once a non-empty directory is encountered.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]