rdblue commented on a change in pull request #3069:
URL: https://github.com/apache/iceberg/pull/3069#discussion_r716264198
##########
File path: core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java
##########
@@ -127,8 +140,20 @@ protected void validate(TableMetadata base) {
}
}
- if (conflictDetectionFilter != null && base.currentSnapshot() != null) {
- validateAddedDataFiles(base, startingSnapshotId,
conflictDetectionFilter, caseSensitive);
+ if (appendConflictDetectionFilter != null && base.currentSnapshot() !=
null) {
+ validateAddedDataFiles(base, startingSnapshotId,
appendConflictDetectionFilter, caseSensitive);
+ }
+
+ boolean validateNewDeletes = deleteConflictDetectionFilter != null &&
base.currentSnapshot() != null;
Review comment:
I think that the behavior here should be slightly different. There are
two concerns: 1) whether to check delete files for snapshot isolation and 2)
what conflict detection filter to use. Basing `validateNewDeletes` on whether
the conflict detection filter was set doesn't seem correct to me.
I don't think there is a case where we don't want to validate delete files
if we have called `validateFromSnapshot` to set the base snapshot. I think that
we should add this as a boolean field that is set when `validateFromSnapshot`
is called.
Then, if we are validating delete files, we should have two separate checks.
First, if there are any files in `deletedDataFiles`, then we perform the
validation below. If the conflict detection filter wasn't set, then we should
use `Expressions.alwaysTrue` to find candidate delete files. Second, if an
overwrite filter was set, then we should run `validateNoNewDeletes` with either
the delete filter or the delete conflict detection filter. The conflict
detection filter should be an optimization, not a way to turn off delete
validations.
I think that makes the API more understandable and consistent.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]