amogh-jahagirdar commented on code in PR #5379:
URL: https://github.com/apache/iceberg/pull/5379#discussion_r938272527


##########
aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java:
##########
@@ -168,56 +169,42 @@ public Map<String, String> properties() {
   @Override
   public void deleteFiles(Iterable<String> paths) throws 
BulkDeletionFailureException {
     if (awsProperties.s3DeleteTags() != null && 
!awsProperties.s3DeleteTags().isEmpty()) {
+      Set<Tag> deleteTags = awsProperties.s3DeleteTags();
       Tasks.foreach(paths)
           .noRetry()
           .executeWith(executorService())
           .suppressFailureWhenFinished()
           .onFailure(
-              (path, exc) ->
-                  LOG.warn(
-                      "Failed to add delete tags: {} to {}",
-                      awsProperties.s3DeleteTags(),
-                      path,
-                      exc))
-          .run(path -> tagFileToDelete(path, awsProperties.s3DeleteTags()));
+              (path, exc) -> LOG.warn("Failed to add delete tags: {} to {}", 
deleteTags, path, exc))
+          .run(path -> tagFileToDelete(path, deleteTags));
     }
 
-    if (!awsProperties.isS3DeleteEnabled()) {
-      return;
+    if (awsProperties.isS3DeleteEnabled()) {
+      SetMultimap<String, String> bucketToObjects = 
computeBucketToObjects(paths);

Review Comment:
   @jackye1995 I don't think we need a separate API for listing actually, we 
are already given the iterable of paths. This iterable can be the 
ListObjectsV2Iterable (which is done as of today in the deletePrefix). 
   
   For lazily loading in memory, and still deleting in a concurrent manner we 
can do the following: 
   
   1.) Still use the previous approach and constructing the bucket/key from the 
path and keeping track of when the objects for a bucket hits a certain batch 
size.
   
   2.) Instead of using task framework, submit the deletion to an executor 
service and just keep track of the future. we don't want to use tasks because 
it will wait for the completion internally
   We want to just submit the batch deletion and move on, and then at the very 
end check the statuses of all those tasks. So using the underlying executor 
service fits that pattern. Let me know what you think. 
    
    Will update the PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to