steveloughran commented on issue #951: HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename URL: https://github.com/apache/hadoop/pull/951#issuecomment-503575079 Latest patch: doing full matrix of test runs (s3guard/non, local/ddb, auth/non-auth) `S3AFileSystem.finishedWrite()` now initiates a BulkUpdate if one wasn't already present and closes it afterwards. This is to ensure that the findings of the addAncestors call are used in the putAndReturn call, which will not add a PUT request for all entries we know exists. This makes the DDB cost of writing a single file depth * GET + (1+ missing parent count) * PUT. Before: depth * PUT as well as extra GET/PUT calls in addAncestors. PUTs cost more than GET calls, so this is a net saving Failing test ITestCommitOperations was tracked down to clock skew triggering a writeback of the getFileStatus result on the probes after the first commit, so causing an intermittent failure in parallel test runs (under load == worse skew). Filed HADOOP-16382 for the underlying issue; for now simply resetting the MetricDiff counter after the various probes. I'm reaching that point where I can't see any more issues, and really need the insight/approval/criticism of others. In particular 1. Is the ancestor tracking efficient and yet sufficient? It aims to eliminate the many spurious parent entries put in s3a commit operations and in parallel renames, as well as in simple file writes. 2. Does the metadata update strategy in ProgressiveRenameTracker hold together? 3. Is the rename algorithm in `org.apache.hadoop.fs.s3a.impl.RenameOperation` understandable and correct? feedback strongly encouraged.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
