jackye1995 opened a new issue #1763:
URL: https://github.com/apache/iceberg/issues/1763


   This issue is created to discuss about supporting multipart upload for 
S3FileIO. Traditionally people use [S3 
TransferManager](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-s3-transfermanager.html)
 to do multipart upload. However, there are a few reasons against using it:
   1. we are using AWS SDK v2, where the transfer manager is not available yet. 
The [related Github Issue](https://github.com/aws/aws-sdk-java-v2/issues/37) 
has not been updated for a long time. I am currently reaching out to S3 team 
for more details, but most likely we would not have it in short term.
   2. the transfer manager still requires buffering the entire content to local 
disk as a temp file before doing a multipart upload (as far as I know). 
@danielcweeks mentioned in the community meeting last time that we can do 
progressive upload to minimize disk space usage, and I agree that sounds like 
the right approach to go.
   
   So question to @danielcweeks, do you plan to contribute the progressive 
upload code in short term? If not, we can hash out the design and I can 
implement that asap.
   
   And please feel free to bring up any other potential suggestions for 
optimizing the write path of `S3FileIO` here before any one of us submit the 
implementation, thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to