rdblue commented on a change in pull request #3199:
URL: https://github.com/apache/iceberg/pull/3199#discussion_r718939020



##########
File path: api/src/main/java/org/apache/iceberg/OverwriteFiles.java
##########
@@ -145,4 +151,50 @@
    */
   @Deprecated
   OverwriteFiles validateNoConflictingAppends(Long readSnapshotId, Expression 
conflictDetectionFilter);
+
+  /**
+   * Sets a conflict detection filter used to validate concurrently added data 
and delete files.
+   * <p>
+   * If not called, a true literal will be used as the conflict detection 
filter.
+   *
+   * @param conflictDetectionFilter an expression on rows in the table
+   * @return this for method chaining
+   */
+  OverwriteFiles conflictDetectionFilter(Expression conflictDetectionFilter);
+
+  /**
+   * Enables validation that data files added concurrently do not conflict 
with this commit's operation.
+   * <p>
+   * This method should be called while committing non-idempotent overwrite 
operations.
+   * If a concurrent operation commits a new file after the data was read and 
that file might
+   * contain rows matching the specified conflict detection filter, the 
overwrite operation
+   * will detect this during retries and fail.
+   * <p>
+   * Calling this method with a correct conflict detection filter is required 
to maintain
+   * serializable isolation for overwrite operations. Otherwise, the isolation 
level
+   * will be snapshot isolation.
+   * <p>
+   * Validation uses the conflict detection filter passed to {@link 
#conflictDetectionFilter(Expression)} and
+   * applies to operations that happened after the snapshot passed to {@link 
#validateFromSnapshot(long)}.
+   *
+   * @return this for method chaining
+   */
+  OverwriteFiles validateNoConflictingDataFiles();
+
+  /**
+   * Enables validation that delete files added concurrently do not conflict 
with this commit's operation.
+   * <p>
+   * Validating concurrently added delete files is required during 
non-idempotent overwrite operations.
+   * If a concurrent operation adds a new delete file that applies to one of 
the data files being overwritten,
+   * the overwrite operation must be aborted as it may undelete rows that were 
removed concurrently.
+   * <p>
+   * Calling this method with a correct conflict detection filter is required 
to maintain
+   * serializable isolation for overwrite operations.
+   * <p>
+   * Validation uses the conflict detection filter passed to {@link 
#conflictDetectionFilter(Expression)} and
+   * applies to operations that happened after the snapshot passed to {@link 
#validateFromSnapshot(long)}.
+   *
+   * @return this for method chaining
+   */
+  OverwriteFiles validateNoConflictingDeleteFiles();

Review comment:
       I'd prefer to call this `validateNoConflictingDeletes()` because it will 
call `failMissingDeletePaths()` to check that data files that are being 
replaced weren't removed and will also validate there are no new delete files. 
That's checking all deletes, not just delete files.
   
   I think that's the right behavior, too. We don't want to separate this into 
multiple validations because I don't think there is a case where you'd want to 
validate just deleted data files or just delete files.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to