steveloughran commented on issue #1679: HDFS-13934. Multipart uploaders to be 
created through FileSystem/FileContext.
URL: https://github.com/apache/hadoop/pull/1679#issuecomment-546911202
 
 
   It's too early to worry about check style failures; I'd like API reviews 
first. Thanks.
   
   One thing we haven't covered here, is what to do about parent directories.
   
   Although it is not needed for S3, I would like to say "Parent directory must 
exist". 
   
   Then the S3A uploader would add a specific option to disable this check. Why 
so? Because for real file systems you want to specify at the permissions of the 
parent directory, and I don't want to start adding that to the API given that 
mkdirs is there.
   
   Note also, that while this API it would seem sufficient to reimplement the 
S3A committers, in HADOOP-15183 we added a `BulkOperationState` which a 
metastore may issue and which for DynamoDB keeps track of which part it knows 
exists already -so avoid excessive/duplicate DynamoDB IO.
   
   For this multipart uploader to scale we'd have to call 
`MetadataStore.initiateBulkWrite()` get one of these, and use for both probes 
for parent dirs existing in upload and commit operations. the 
`BulkOperationState` would share the uploader's lifecycle and be closed with it,
   
   Important: we would need the same four copy operations, again to avoid 
excessive I/O. if I am copying 100 files, I don't want to make 100 *depth(file) 
calls to S3Guard. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to