[
https://issues.apache.org/jira/browse/HADOOP-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306506#comment-15306506
]
Rajesh Balamohan commented on HADOOP-13164:
-------------------------------------------
Thanks [~steve_l]. When a file is written to a location, that folder along
with its parents become non-empty folders as per the FS contract. For e.g
After writing
"s3a://<sample_bucket>/test/testDeleteNonEmptyDirNonRecursive/d1/d2/d3/childFile.txt",
all parent directories should exist as per fs contract. If so, ideally it is
not needed to invoked deleteUnnecessaryFakeDirectories on every file close
operation. Please correct me if my understanding is wrong. Even in rename,
iteration can stop once a non-empty folder is encountered in its path.
> Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories
> --------------------------------------------------------
>
> Key: HADOOP-13164
> URL: https://issues.apache.org/jira/browse/HADOOP-13164
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Rajesh Balamohan
> Priority: Minor
>
> https://github.com/apache/hadoop/blob/27c4e90efce04e1b1302f668b5eb22412e00d033/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1224
> deleteUnnecessaryFakeDirectories is invoked in S3AFileSystem during rename
> and on outputstream close() to purge any fake directories. Depending on the
> nesting in the folder structure, it might take a lot longer time as it
> invokes getFileStatus multiple times. Instead, it should be able to break
> out of the loop once a non-empty directory is encountered.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]