xloya commented on code in PR #4427:
URL: https://github.com/apache/iceberg/pull/4427#discussion_r843412283


##########
core/src/main/java/org/apache/iceberg/util/Tasks.java:
##########
@@ -212,7 +212,9 @@ public boolean run(Task<I, RuntimeException> task) {
           I item = iterator.next();
           try {
             runTaskWithRetry(task, item);
-            succeeded.add(item);
+            if (revertTask != null) {
+              succeeded.add(item);
+            }

Review Comment:
   Let me explain why the change is needed here:  
   At present, when rewriting, it will be rewritten with the `RewriteFileGroup` 
as the rewrite unit through the `Tasks` util class without `revertWith` method: 
 
   
https://github.com/apache/iceberg/blob/56d8f07ce6cf7ac99f439848ebf7c8b86f5046df/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/BaseRewriteDataFilesSparkAction.java#L307-L313
   But after the rewrite is finished, there is no need to revert the successful 
task. As you can see from the following code, unless we call and set 
`revertTask`, the logic here will not be executed anyway:  
   
https://github.com/apache/iceberg/blob/56d8f07ce6cf7ac99f439848ebf7c8b86f5046df/core/src/main/java/org/apache/iceberg/util/Tasks.java#L126-L129
   
https://github.com/apache/iceberg/blob/56d8f07ce6cf7ac99f439848ebf7c8b86f5046df/core/src/main/java/org/apache/iceberg/util/Tasks.java#L356-L380
   But the current logic is that no matter what, the reference of the 
successful task will be saved to the concurrent queue of `succeeded`, and the 
`RewriteFileGroup` that has been rewritten will also be saved. If we want to 
release the memory of `RewriteFileGroup`, then we should also remove 
unnecessary references to it here:  
   
https://github.com/apache/iceberg/blob/56d8f07ce6cf7ac99f439848ebf7c8b86f5046df/core/src/main/java/org/apache/iceberg/util/Tasks.java#L306-L316



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to