puneetzaroo commented on a change in pull request #3207:
URL: https://github.com/apache/iceberg/pull/3207#discussion_r739570457



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java
##########
@@ -76,6 +76,22 @@
    */
   String TARGET_FILE_SIZE_BYTES = "target-file-size-bytes";
 
+  /**
+   * Determines if the data rewrite action should also remove non-global 
deletes associated with the data files.
+   * By enabling this option, any data filter specified through {@link 
#filter(Expression)} will be converted to
+   * an inclusive partition filter based on all the historical partition specs 
of the table.

Review comment:
       Maybe we are saying the same thing, but why cant the action be to find 
the set of data files which are most impacted by delete files (i.e. have the 
most delete files corresponding to them); and then just merge the delete files 
into these data files , producing new data files. It seems to me just like the 
size based strategy has file size based thresholds, the merge of delete files 
strategy should have thresholds based on number of delete files corresponding 
to the data files.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to