[ 
https://issues.apache.org/jira/browse/HADOOP-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981586#comment-16981586
 ] 

Steve Loughran commented on HADOOP-16725:
-----------------------------------------

A straightforward fix is it not to prune directories by changing the select 
query to only find files. 

It would be a lot more complicated/slower to actually identify childless 
directories and prune them; 

# prune all the files
# enumerate the dirs to prune
# build a depth first List of directories
# for each one -probe for 1+ child; none found => delete

note: for tombstones we can still prune directories.

> s3guard prune can delete directories -leaving orphan children.
> --------------------------------------------------------------
>
>                 Key: HADOOP-16725
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16725
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.3, 3.2.1, 3.1.3
>            Reporter: Steve Loughran
>            Priority: Critical
>
> When s3guard prune is invoked to delete not updated since a specific time, it 
> doesn't check to see if an expired directory entry has any children. As a 
> result -if a child is newer than the cut-off date, the dir entry can be 
> removed but not the child. This can leave S3Guard in an inconsistent state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to