jackye1995 opened a new issue #1763: URL: https://github.com/apache/iceberg/issues/1763
This issue is created to discuss about supporting multipart upload for S3FileIO. Traditionally people use [S3 TransferManager](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-s3-transfermanager.html) to do multipart upload. However, there are a few reasons against using it: 1. we are using AWS SDK v2, where the transfer manager is not available yet. The [related Github Issue](https://github.com/aws/aws-sdk-java-v2/issues/37) has not been updated for a long time. I am currently reaching out to S3 team for more details, but most likely we would not have it in short term. 2. the transfer manager still requires buffering the entire content to local disk as a temp file before doing a multipart upload (as far as I know). @danielcweeks mentioned in the community meeting last time that we can do progressive upload to minimize disk space usage, and I agree that sounds like the right approach to go. So question to @danielcweeks, do you plan to contribute the progressive upload code in short term? If not, we can hash out the design and I can implement that asap. And please feel free to bring up any other potential suggestions for optimizing the write path of `S3FileIO` here before any one of us submit the implementation, thanks! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
