[
https://issues.apache.org/jira/browse/HADOOP-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981586#comment-16981586
]
Steve Loughran commented on HADOOP-16725:
-----------------------------------------
A straightforward fix is it not to prune directories by changing the select
query to only find files.
It would be a lot more complicated/slower to actually identify childless
directories and prune them;
# prune all the files
# enumerate the dirs to prune
# build a depth first List of directories
# for each one -probe for 1+ child; none found => delete
note: for tombstones we can still prune directories.
> s3guard prune can delete directories -leaving orphan children.
> --------------------------------------------------------------
>
> Key: HADOOP-16725
> URL: https://issues.apache.org/jira/browse/HADOOP-16725
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.0.3, 3.2.1, 3.1.3
> Reporter: Steve Loughran
> Priority: Critical
>
> When s3guard prune is invoked to delete not updated since a specific time, it
> doesn't check to see if an expired directory entry has any children. As a
> result -if a child is newer than the cut-off date, the dir entry can be
> removed but not the child. This can leave S3Guard in an inconsistent state.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]