amogh-jahagirdar opened a new pull request #4052: URL: https://github.com/apache/iceberg/pull/4052
Starting a draft PR for performing batch deletion for S3 objects. Related issue: https://github.com/apache/iceberg/issues/4012 This can be useful for the expire snapshots and remove orphan files operations. In this PR we update the FileIO interface and add a S3 implementation to perform the batch removal. In other PRs we can tackle updating the actions for expiring snapshots and removing orphan files. My thoughts there are we do have a separate batching mechanism within the action implementation (for an action a user would specify a batch size (defaulting to 1). Tasks looks like it is generic and for the parameter we could partition the given input list into batches, and then we could pass in a function which accepts a list of strings for performing the batch deletion. This would only be done in the case the batch size is greater than 1. Any thoughts? @szehon-ho @dramaticlly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
