steveloughran commented on issue #951: HADOOP-15183. S3Guard store becomes 
inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/951#issuecomment-503575079
 
 
   Latest patch: doing full matrix of test runs (s3guard/non, local/ddb, 
auth/non-auth)
   
   
   `S3AFileSystem.finishedWrite()` now initiates a BulkUpdate if one wasn't 
already present and  closes it afterwards. This is to ensure that the findings 
of the addAncestors call are used in the putAndReturn call, which will not add 
a PUT request for all entries we know exists. This makes the DDB cost of 
writing a single file depth * GET + (1+ missing parent count) * PUT. Before: 
depth * PUT as well as extra GET/PUT calls in addAncestors. PUTs cost more than 
GET calls, so this is a net saving
       
   Failing test ITestCommitOperations was tracked down to clock skew triggering 
a writeback of the getFileStatus result on the probes after the first commit, 
so causing an intermittent failure in parallel test runs (under load == worse 
skew).
     
   Filed HADOOP-16382 for the underlying issue; for now simply resetting the 
MetricDiff counter after the various probes.
   
   I'm reaching that point where I can't see any more issues, and really need 
the insight/approval/criticism of others. In particular
   
   1. Is the ancestor tracking efficient and yet sufficient? It aims to 
eliminate the many spurious parent entries put in s3a commit operations and in 
parallel renames, as well as in simple file writes.
   2. Does the metadata update strategy in ProgressiveRenameTracker hold 
together?
   3. Is the rename algorithm in 
`org.apache.hadoop.fs.s3a.impl.RenameOperation` understandable and correct? 
   
   feedback strongly encouraged. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to