szehon-ho commented on code in PR #7389:
URL: https://github.com/apache/iceberg/pull/7389#discussion_r1180901970


##########
api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java:
##########
@@ -41,11 +119,60 @@
   RewritePositionDeleteFiles filter(Expression expression);
 
   /** The action result that contains a summary of the execution. */
+  @Value.Immutable
   interface Result {
+    List<PositionDeleteGroupRewriteResult> rewriteResults();
+
     /** Returns the count of the position deletes that been rewritten. */
     int rewrittenDeleteFilesCount();
 
     /** Returns the count of the added delete files. */
     int addedDeleteFilesCount();
+
+    /** Returns the number of bytes of position deletes that have been 
rewritten */
+    long rewrittenBytesCount();
+
+    /** Returns the number of bytes of newly added position deletes */
+    long addedBytesCount();
+  }
+
+  /**
+   * For a particular position delete file group, the number of position 
delete files which are
+   * newly created and the number of files which were formerly part of the 
table but have been
+   * rewritten.
+   */
+  @Value.Immutable
+  interface PositionDeleteGroupRewriteResult {

Review Comment:
   Sorry , not familiar, if you can point me to that



##########
api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java:
##########
@@ -28,6 +32,80 @@
 public interface RewritePositionDeleteFiles
     extends SnapshotUpdate<RewritePositionDeleteFiles, 
RewritePositionDeleteFiles.Result> {
 
+  /**
+   * Enable committing groups of files (see max-file-group-size-bytes) prior 
to the entire rewrite
+   * completing. This will produce additional commits but allow for progress 
even if some groups
+   * fail to commit. This setting will not change the correctness of the 
rewrite operation as file
+   * groups can be compacted independently.
+   *
+   * <p>The default is false, which produces a single commit when the entire 
job has completed.
+   */
+  String PARTIAL_PROGRESS_ENABLED = "partial-progress.enabled";
+
+  boolean PARTIAL_PROGRESS_ENABLED_DEFAULT = false;
+
+  /**
+   * The maximum amount of Iceberg commits that this rewrite is allowed to 
produce if partial
+   * progress is enabled. This setting has no effect if partial progress is 
disabled.
+   */
+  String PARTIAL_PROGRESS_MAX_COMMITS = "partial-progress.max-commits";
+
+  int PARTIAL_PROGRESS_MAX_COMMITS_DEFAULT = 10;
+
+  /**
+   * The entire rewrite operation is broken down into pieces based on 
partitioning and within
+   * partitions based on size into groups. These sub-units of the rewrite are 
referred to as file
+   * groups. The largest amount of data that should be compacted in a single 
group is controlled by
+   * {@link #MAX_FILE_GROUP_SIZE_BYTES}. This helps with breaking down the 
rewriting of very large
+   * partitions which may not be rewritable otherwise due to the resource 
constraints of the
+   * cluster. For example a sort based rewrite may not scale to terabyte sized 
partitions, those
+   * partitions need to be worked on in small subsections to avoid exhaustion 
of resources.
+   *
+   * <p>When grouping files, the underlying rewrite strategy will use this 
value as to limit the
+   * files which will be included in a single file group. A group will be 
processed by a single
+   * framework "action". For example, in Spark this means that each group 
would be rewritten in its
+   * own Spark action. A group will never contain files for multiple output 
partitions.
+   */
+  String MAX_FILE_GROUP_SIZE_BYTES = "max-file-group-size-bytes";

Review Comment:
   Removed 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to