[ 
https://issues.apache.org/jira/browse/HADOOP-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-16725.
-------------------------------------
    Resolution: Invalid

OK, my test is showing that this is not as bad as I thought.


Prior to HADOOP-16697, PUT operations on a s3guard table will update all parent 
dir entries, at least for the active operation, so single file writes will 
always have a complete tree of parents all of the same age.

* BulkOperationState reduces the parent writes so that on a bulk rename/commit 
a parent will only ever be written once. Thus, for any big directory rename, 
the parent dirs will be older than the child.
* the resetting of the auth mode of a dir in a prune will (a) recreate the 
parent entry (with a Timestamp == 0!) which will then rebuild all parent dirs.

Therefore: directories with pruned children are recreated -only dirs without 
children would be prune

Even given that, I'm unable to recreate the failure. Why not?

* We're filtering on modtime, but directories don't have a modtime field, so 
there was an implicit "is_dir=false" filter.

Therefore: nothing to worry about after all. The logic was there, just hidden.

I'm going to merge the work done here (new test, extended SELECT) into the 
HADOOP-16697 work; just putting it up as a separate PR for completeness.


> s3guard prune can delete directories -leaving orphan children.
> --------------------------------------------------------------
>
>                 Key: HADOOP-16725
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16725
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.3, 3.2.1, 3.1.3
>            Reporter: Steve Loughran
>            Priority: Critical
>
> When s3guard prune is invoked to delete not updated since a specific time, it 
> doesn't check to see if an expired directory entry has any children. As a 
> result -if a child is newer than the cut-off date, the dir entry can be 
> removed but not the child. This can leave S3Guard in an inconsistent state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to