steveloughran commented on issue #951: HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/951#issuecomment-501315250 @gabor, thanks for that. I have sometimes seen that failure on ITestMagicCommitMR job, hence we now log when it was deleted. What was the actual time when the test was run? What I can do is add some extra diags in the operations where the committers update the DDB tables on commit, because this failure implies they didn't create an entry for the parent dir. this all happens in finishedWrite() which first calls MetastoreAddAncestors, which in DDB goes up the tree to find the first parent dir which is in the store and stops there. Then in the metastore.put() afterwards we add the new file and its parents, but skipping those where there's already an entry. I wonder if we can/should do more here 1. I'll add a check in addAncestors to throw a PathIOE t if the ancestor scan finds a file. Let me know if you see it :) 2. we should consider whether we should do the addAncestors work at all rather than just do the put() and have it create the entire ancestor tree, rather than stop the moment it finds a parent entry in the DDB. That will implement more recovery of inconsistent state.at the cost (over the entire bulk operation) of one more ddb write per directory level entry and one fewer get for every parent which doesn't have an entry in the store
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
