jackye1995 commented on a change in pull request #3207:
URL: https://github.com/apache/iceberg/pull/3207#discussion_r718929401



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java
##########
@@ -76,6 +76,22 @@
    */
   String TARGET_FILE_SIZE_BYTES = "target-file-size-bytes";
 
+  /**
+   * Determines if the data rewrite action should also remove non-global 
deletes associated with the data files.
+   * By enabling this option, any data filter specified through {@link 
#filter(Expression)} will be converted to
+   * an inclusive partition filter based on all the historical partition specs 
of the table.

Review comment:
       My understanding is that for non-global deletes, as long as the the 
filter is a partition filter, if we compact all the data files produced by the 
plan, the delete file can be safely removed. This is inefficient because 
technically we should do the following:
   1. get all data files satisfying the filter
   2. get the delete files of the data files
   3. for the delete files, find the connected component (if we view this as a 
dependency graph of files), which might produce a much smaller subset of data 
files to compact
   4. replan tasks based on the set of data and delete files
   
   But that goes too far away from the RewriteDataFiles action, and might be an 
overkill in the end.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to