rdblue commented on a change in pull request #2865:
URL: https://github.com/apache/iceberg/pull/2865#discussion_r676770243
##########
File path: api/src/main/java/org/apache/iceberg/RewriteFiles.java
##########
@@ -54,12 +54,23 @@ default RewriteFiles rewriteFiles(Set<DataFile>
filesToDelete, Set<DataFile> fil
/**
* Add a rewrite that replaces one set of files with another set that
contains the same data.
*
- * @param dataFilesToDelete data files that will be replaced (deleted).
- * @param deleteFilesToDelete delete files that will be replaced (deleted).
+ * @param dataFilesToReplace data files that will be replaced (deleted).
+ * @param deleteFilesToReplace delete files that will be replaced (deleted).
* @param dataFilesToAdd data files that will be added.
* @param deleteFilesToAdd delete files that will be added.
* @return this for method chaining.
*/
- RewriteFiles rewriteFiles(Set<DataFile> dataFilesToDelete, Set<DeleteFile>
deleteFilesToDelete,
+ RewriteFiles rewriteFiles(Set<DataFile> dataFilesToReplace, Set<DeleteFile>
deleteFilesToReplace,
Set<DataFile> dataFilesToAdd, Set<DeleteFile>
deleteFilesToAdd);
+
+ /**
+ * Set the snapshot ID used in any reads for this operation.
+ * <p>
+ * Validations will check changes after this snapshot ID. If the from
snapshot is not set, all ancestor snapshots
+ * through the table's initial snapshot are validated.
+ *
+ * @param snapshotId a snapshot ID
+ * @return this for method chaining
+ */
+ RewriteFiles validateFromSnapshot(long snapshotId);
Review comment:
This is necessary because the current snapshot's ID may not be the one
that was used for the operation. This `RewriteFiles` operation may be created
significantly later than the original planning that is done for compaction. For
example, in the new Spark compaction code, the rewrite is created when a group
is ready to commit. The table state may have changed from that time so using
the current snapshot ID could easily skip delta commits that were interleaved,
leaving the same issue that we are fixing here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]