dramaticlly opened a new pull request, #5412:
URL: https://github.com/apache/iceberg/pull/5412

   In #4052, S3 fileIO now implement a new interfaces to support S3 batch 
deletion, this PR introduce it for Spark expire-snapshots procedure to 
conditionally support delete files in batch if underlying fileIO supports it 
(if implements `SupportsBulkOperations` interface and currently only S3FileIO 
support such)
   
   It default to use S3 batch deletion if fileIO is supported in catalog, allow 
for customization with `bulkDeleteWith` method
   - cannot reuse the existing  `bulkDelete` consumer function because it only 
take single file name at a time instead of a iterable
   - did not add interface override, want to keep this change as small as 
possible, once approved then we can retroactively apply to interface and 
previous spark version (2.4/3.0/3.1/3.2)
   - Considering the existing test fixture all use HadoopTables and it's very 
hard to test the integration of S3FileIO and Spark action together in unit 
tests, so I am looking for a way to do some integration tests and will share 
the results later 
   
   Similar to #5373 but for expire-snapshots procedure
   
   CC @rdblue , @danielcweeks, @amogh-jahagirdar, @szehon-ho 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to