steveloughran commented on issue #951: HADOOP-15183. S3Guard store becomes 
inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/951#issuecomment-501315250
 
 
   @gabor, thanks for that. I have sometimes seen that failure on 
ITestMagicCommitMR job, hence we now log when it was deleted. What was the 
actual time when the test was run?
   
   What I can do is add some extra diags in the operations where the committers 
update the DDB tables on commit, because this failure implies they didn't 
create an entry for the parent dir. 
   
   this all happens in finishedWrite() which first calls MetastoreAddAncestors, 
which in DDB goes up the tree to find the first parent dir which is in the 
store and stops there. Then in the metastore.put() afterwards we add the new 
file and its parents, but skipping those where there's already an entry.
   
   I wonder if we can/should do more here
   
   1. I'll add a check in addAncestors to throw a PathIOE t if the ancestor 
scan finds a file. Let me know if you see it :)
   2. we should consider whether we should do the addAncestors work at all 
rather than just do the put() and have it create the entire ancestor tree, 
rather than stop the moment it finds a parent entry in the DDB. That will 
implement more recovery of inconsistent state.at the cost (over the entire bulk 
operation) of one more ddb write per directory level entry and one fewer get 
for every parent which doesn't have an entry in the store

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to