steveloughran opened a new pull request #951: HADOOP-15183. S3Guard store 
becomes inconsistent after partial failure of rename
URL: https://github.com/apache/hadoop/pull/951
 
 
   
   Contributed by Steve Loughran.
   
   This is the squashed patch of PR #843 commit 115fb770
   
   Contains
   
   * HADOOP-13936. S3Guard: DynamoDB can go out of sync with 
S3AFileSystem.delete()
   
   * HADOOP-15604. Bulk commits of S3A MPUs place needless excessive load on S3 
& S3Guard
   
   * HADOOP-15658. Memory leak in S3AOutputStream
   
   * HADOOP-16364. S3Guard table destroy to map IllegalArgumentExceptions to 
IOEs]
   
   This work adds to the S3Guard Metastore APIs
   
   * the notion of a "BulkOperation" : A store-specific class which is 
requested before initiating bulk work (put, purge, rename) and which then can 
be used to cache table changes performed during the bulk operation. This allows 
for renames and commit operations to avoid duplicate creation of parent entries 
in the tree: the store can track what is already created/found.
   
   * the notion of a "RenameTracker" which factors out the task of updating a 
metastore with changes to the filesystem during a rename, (files added + 
deleted) and after the completion of the operation, successful or not.
   
   The original rename update -the one which failed to update the store until 
the end of the rename is implemented as the DelayedUpdateRenameTracker, while a 
new ProgressiveRenameTracker updates the sttore as individual files are copied 
and when bulk deletes complete. To avoid performance problems, stores mut 
provide a BulkOperation implementation which remembers ancestors added. The 
DynamoDBMetastore does this.
   
   Some of the new features are implemented as part of a gradual refactoring of 
the S3AFileSystem itself: the handling of partial delete failures is in its own 
class org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport which, rather than 
being given a reference back to the owning S3AFileSystem, is handed a 
StoreContext which contains restriced attributes and callback. As this 
refactoring continues in future patches, and the different layers of a new 
store model factored out, this will be extended.
   
   Change-Id: Ie0bd96ab861f0f30170b75f78e5503fc0e929524

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to