[
https://issues.apache.org/jira/browse/HADOOP-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392309#comment-15392309
]
Steve Loughran commented on HADOOP-13164:
-----------------------------------------
You need a story at error handling here. Bear in mind that
removeKey(nonexistentpath) will fail at the AWS SDK layer; your code will need
to handle that. The code you cut did that by swallowing the exceptions. I'd
expect that to continue.
I've been thinking we can more than just delete with an async thread; you can
do parent dir creation on a delete operation, validation in a create() call
that there is no parent directory that is actually a file (this could be
launched in the create(), the result awaited on/checked in the close()/first
PUT. That argues for having an executor that takes a queue of actions pushed
down, of which dir deletion is only one.
We'd need queue length another metric; actually a count of #of fake directory
delete calls made and actual deletes executed. That'd be something that the
tests can use.
I'd like to see a way to test this. Especially the shutdown process.
I'm also wondering whether we can create new sequences of operations which
could lose data. Something like
{code}
touch("/path/1/2/3/4/5/6")
delete("/path/1/2/3/4/5/6")
echo("/path", 'important text")
{code}
If that recursive delete hasn't completed before that echo operation happens,
data gets lost.
Thinking about this some more, I really worry about the async behaviour. Maybe
we should try to optimise the sync one as a single removeKeys on all the
parents, Again, we could do a scale test to play with the options here, to
measure what made the lowest #of calls, and the time it took
> Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories
> --------------------------------------------------------
>
> Key: HADOOP-13164
> URL: https://issues.apache.org/jira/browse/HADOOP-13164
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Rajesh Balamohan
> Priority: Minor
> Attachments: HADOOP-13164.branch-2.WIP.patch
>
>
> https://github.com/apache/hadoop/blob/27c4e90efce04e1b1302f668b5eb22412e00d033/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1224
> deleteUnnecessaryFakeDirectories is invoked in S3AFileSystem during rename
> and on outputstream close() to purge any fake directories. Depending on the
> nesting in the folder structure, it might take a lot longer time as it
> invokes getFileStatus multiple times. Instead, it should be able to break
> out of the loop once a non-empty directory is encountered.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]