rdblue commented on a change in pull request #2865:
URL: https://github.com/apache/iceberg/pull/2865#discussion_r676770243



##########
File path: api/src/main/java/org/apache/iceberg/RewriteFiles.java
##########
@@ -54,12 +54,23 @@ default RewriteFiles rewriteFiles(Set<DataFile> 
filesToDelete, Set<DataFile> fil
   /**
    * Add a rewrite that replaces one set of files with another set that 
contains the same data.
    *
-   * @param dataFilesToDelete   data files that will be replaced (deleted).
-   * @param deleteFilesToDelete delete files that will be replaced (deleted).
+   * @param dataFilesToReplace   data files that will be replaced (deleted).
+   * @param deleteFilesToReplace delete files that will be replaced (deleted).
    * @param dataFilesToAdd      data files that will be added.
    * @param deleteFilesToAdd    delete files that will be added.
    * @return this for method chaining.
    */
-  RewriteFiles rewriteFiles(Set<DataFile> dataFilesToDelete, Set<DeleteFile> 
deleteFilesToDelete,
+  RewriteFiles rewriteFiles(Set<DataFile> dataFilesToReplace, Set<DeleteFile> 
deleteFilesToReplace,
                             Set<DataFile> dataFilesToAdd, Set<DeleteFile> 
deleteFilesToAdd);
+
+  /**
+   * Set the snapshot ID used in any reads for this operation.
+   * <p>
+   * Validations will check changes after this snapshot ID. If the from 
snapshot is not set, all ancestor snapshots
+   * through the table's initial snapshot are validated.
+   *
+   * @param snapshotId a snapshot ID
+   * @return this for method chaining
+   */
+  RewriteFiles validateFromSnapshot(long snapshotId);

Review comment:
       This is necessary because the current snapshot's ID may not be the one 
that was used for the operation. This `RewriteFiles` operation may be created 
significantly later than the original planning that is done for compaction. For 
example, in the new Spark compaction code, the rewrite is created when a group 
is ready to commit. The table state may have changed from that time so using 
the current snapshot ID could easily skip delta commits that were interleaved, 
leaving the same issue that we are fixing here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to