rdblue commented on a change in pull request #3199:
URL: https://github.com/apache/iceberg/pull/3199#discussion_r718939020
##########
File path: api/src/main/java/org/apache/iceberg/OverwriteFiles.java
##########
@@ -145,4 +151,50 @@
*/
@Deprecated
OverwriteFiles validateNoConflictingAppends(Long readSnapshotId, Expression
conflictDetectionFilter);
+
+ /**
+ * Sets a conflict detection filter used to validate concurrently added data
and delete files.
+ * <p>
+ * If not called, a true literal will be used as the conflict detection
filter.
+ *
+ * @param conflictDetectionFilter an expression on rows in the table
+ * @return this for method chaining
+ */
+ OverwriteFiles conflictDetectionFilter(Expression conflictDetectionFilter);
+
+ /**
+ * Enables validation that data files added concurrently do not conflict
with this commit's operation.
+ * <p>
+ * This method should be called while committing non-idempotent overwrite
operations.
+ * If a concurrent operation commits a new file after the data was read and
that file might
+ * contain rows matching the specified conflict detection filter, the
overwrite operation
+ * will detect this during retries and fail.
+ * <p>
+ * Calling this method with a correct conflict detection filter is required
to maintain
+ * serializable isolation for overwrite operations. Otherwise, the isolation
level
+ * will be snapshot isolation.
+ * <p>
+ * Validation uses the conflict detection filter passed to {@link
#conflictDetectionFilter(Expression)} and
+ * applies to operations that happened after the snapshot passed to {@link
#validateFromSnapshot(long)}.
+ *
+ * @return this for method chaining
+ */
+ OverwriteFiles validateNoConflictingDataFiles();
+
+ /**
+ * Enables validation that delete files added concurrently do not conflict
with this commit's operation.
+ * <p>
+ * Validating concurrently added delete files is required during
non-idempotent overwrite operations.
+ * If a concurrent operation adds a new delete file that applies to one of
the data files being overwritten,
+ * the overwrite operation must be aborted as it may undelete rows that were
removed concurrently.
+ * <p>
+ * Calling this method with a correct conflict detection filter is required
to maintain
+ * serializable isolation for overwrite operations.
+ * <p>
+ * Validation uses the conflict detection filter passed to {@link
#conflictDetectionFilter(Expression)} and
+ * applies to operations that happened after the snapshot passed to {@link
#validateFromSnapshot(long)}.
+ *
+ * @return this for method chaining
+ */
+ OverwriteFiles validateNoConflictingDeleteFiles();
Review comment:
I'd prefer to call this `validateNoConflictingDeletes()` because it will
call `failMissingDeletePaths()` to check that data files that are being
replaced weren't removed and will also validate there are no new delete files.
That's checking all deletes, not just delete files.
I think that's the right behavior, too. We don't want to separate this into
multiple validations because I don't think there is a case where you'd want to
validate just deleted data files or just delete files.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]