amogh-jahagirdar commented on a change in pull request #4052:
URL: https://github.com/apache/iceberg/pull/4052#discussion_r814142157



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java
##########
@@ -100,6 +115,67 @@ public void deleteFile(String path) {
     client().deleteObject(deleteRequest);
   }
 
+  /**
+   * Deletes the given paths in a batched manner.
+   * <p>
+   * The paths are grouped by bucket, and deletion is triggered when we either 
reach the configured batch size
+   * or have a final remainder batch for each bucket.
+   *
+   * @param paths paths to delete
+   */
+  @Override
+  public void deleteFiles(Iterable<String> paths) {
+    SetMultimap<String, String> bucketToObjects = 
Multimaps.newSetMultimap(Maps.newHashMap(), Sets::newHashSet);
+    List<String> failedDeletions = Lists.newArrayList();
+    for (String path : paths) {
+      S3URI location = new S3URI(path);
+      String bucket = location.bucket();
+      String objectKey = location.key();
+      Set<String> objectsInBucket = bucketToObjects.get(bucket);
+      if (objectsInBucket.size() == awsProperties.s3FileIoDeleteBatchSize()) {
+        List<String> failedDeletionsForBatch = deleteObjectsInBucket(bucket, 
objectsInBucket);

Review comment:
       I was thinking it would be up to the provider of the S3 client who would 
configure the retry policy on the client. Is that something within the scope of 
FileIO? If so I think that's something we could tackle in a follow-on.
   
   Someone could use a custom AwsClientFactory . The DefaultAwsClientFactory 
will create an S3 client with the default retry policy which would retry on the 
failures mentioned in 
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/retry/PredefinedRetryPolicies.SDKDefaultRetryCondition.html.
   
   
   So basically 5xx errors like service unavailable, throttling, clock-skew etc 
would be retried. Failures such as the bucket not existing, or unauthorized 4xx 
errors would not be retried by default. @jackye1995 @rdblue thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to