dramaticlly opened a new pull request, #5412: URL: https://github.com/apache/iceberg/pull/5412
In #4052, S3 fileIO now implement a new interfaces to support S3 batch deletion, this PR introduce it for Spark expire-snapshots procedure to conditionally support delete files in batch if underlying fileIO supports it (if implements `SupportsBulkOperations` interface and currently only S3FileIO support such) It default to use S3 batch deletion if fileIO is supported in catalog, allow for customization with `bulkDeleteWith` method - cannot reuse the existing `bulkDelete` consumer function because it only take single file name at a time instead of a iterable - did not add interface override, want to keep this change as small as possible, once approved then we can retroactively apply to interface and previous spark version (2.4/3.0/3.1/3.2) - Considering the existing test fixture all use HadoopTables and it's very hard to test the integration of S3FileIO and Spark action together in unit tests, so I am looking for a way to do some integration tests and will share the results later Similar to #5373 but for expire-snapshots procedure CC @rdblue , @danielcweeks, @amogh-jahagirdar, @szehon-ho -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
