[ 
https://issues.apache.org/jira/browse/HADOOP-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775059#comment-16775059
 ] 

Steve Loughran commented on HADOOP-16090:
-----------------------------------------

In HADOOP-16134 I've done a first PoC of what a write operations context would 
be. 

# that PoC is flawed, as I note
# any context gets complex fast

To fix your problem in a way which can be backported, have a look at what 
{{deleteUnnecessaryFakeDirectories()}} did before HADOOP-16134: it does a probe 
+ only delete if its there. Something similar to that could be made optional 
again, but not just a naive reinstatement of that operation. getFileStatus is 
overkill since its only a marker path + "/" entry to be looked for, and S3Guard 
does expect directory entries to exist anyway.

Better
* if a "fs.s3a.versioned.store" flag is true, switch to walking up the tree
* use getObjectMetadata() to look for the specific entry of a "/" file; catch 
FNFEs as not a problem
* stop on the first marker, delete it.

the operation will be O(depth), so make writing & renaming files slower the 
deeper you get, but there's ~no solution there

(there will be with a WriteOperationsContext if it notes when the dest file is 
being overwritten ... in that case we can assume there are no parent entries 
and don't even issue a DELETE request)

Anyway: avoid a write context, lets start off with an option to say "versioned 
fs", and i'll backport easily. 

Test wise, there are FS counters for fake directory deleteions in 
getStorageStatistics; if you create a new FS instance declaring the dest is 
versioned, you'd expect to see different count values for deletes and of 
OBJECT_METADATA_REQUESTS -that'll be how we can verify that the behaviour of 
the store changed



> deleteUnnecessaryFakeDirectories() creates unnecessary delete markers in a 
> versioned S3 bucket
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16090
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16090
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.1
>            Reporter: Dmitri Chmelev
>            Assignee: Dmitri Chmelev
>            Priority: Minor
>
> The fix to avoid calls to getFileStatus() for each path component in 
> deleteUnnecessaryFakeDirectories() (HADOOP-13164) results in accumulation of 
> delete markers in versioned S3 buckets. The above patch replaced 
> getFileStatus() checks with a single batch delete request formed by 
> generating all ancestor keys formed from a given path. Since the delete 
> request is not checking for existence of fake directories, it will create a 
> delete marker for every path component that did not exist (or was previously 
> deleted). Note that issuing a DELETE request without specifying a version ID 
> will always create a new delete marker, even if one already exists ([AWS S3 
> Developer 
> Guide|https://docs.aws.amazon.com/AmazonS3/latest/dev/RemDelMarker.html])
> Since deleteUnnecessaryFakeDirectories() is called as a callback on 
> successful writes and on renames, delete markers accumulate rather quickly 
> and their rate of accumulation is inversely proportional to the depth of the 
> path. In other words, directories closer to the root will have more delete 
> markers than the leaves.
> This behavior negatively impacts performance of getFileStatus() operation 
> when it has to issue listObjects() request (especially v1) as the delete 
> markers have to be examined when the request searches for first current 
> non-deleted version of an object following a given prefix.
> I did a quick comparison against 3.x and the issue is still present: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2947|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2947]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to