rdblue commented on a change in pull request #3069:
URL: https://github.com/apache/iceberg/pull/3069#discussion_r716264198



##########
File path: core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java
##########
@@ -127,8 +140,20 @@ protected void validate(TableMetadata base) {
       }
     }
 
-    if (conflictDetectionFilter != null && base.currentSnapshot() != null) {
-      validateAddedDataFiles(base, startingSnapshotId, 
conflictDetectionFilter, caseSensitive);
+    if (appendConflictDetectionFilter != null && base.currentSnapshot() != 
null) {
+      validateAddedDataFiles(base, startingSnapshotId, 
appendConflictDetectionFilter, caseSensitive);
+    }
+
+    boolean validateNewDeletes = deleteConflictDetectionFilter != null && 
base.currentSnapshot() != null;

Review comment:
       I think that the behavior here should be slightly different. There are 
two concerns: 1) whether to check delete files for snapshot isolation and 2) 
what conflict detection filter to use. Basing `validateNewDeletes` on whether 
the conflict detection filter was set doesn't seem correct to me.
   
   I don't think there is a case where we don't want to validate delete files 
if we have called `validateFromSnapshot` to set the base snapshot. I think that 
we should add this as a boolean field that is set when `validateFromSnapshot` 
is called.
   
   Then, if we are validating delete files, we should have two separate checks. 
First, if there are any files in `deletedDataFiles`, then we perform the 
validation below. If the conflict detection filter wasn't set, then we should 
use `Expressions.alwaysTrue` to find candidate delete files. Second, if an 
overwrite filter was set, then we should run `validateNoNewDeletes` with either 
the delete filter or the delete conflict detection filter. The conflict 
detection filter should be an optimization, not a way to turn off delete 
validations.
   
   I think that makes the API more understandable and consistent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to