aokolnychyi commented on a change in pull request #351: Extend Iceberg with a 
way to overwrite files for eager updates/deletes
URL: https://github.com/apache/incubator-iceberg/pull/351#discussion_r317398607
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/OverwriteData.java
 ##########
 @@ -88,6 +119,42 @@ public OverwriteFiles validateAddedFiles() {
       }
     }
 
+    if (conflictDetectionFilter != null) {
+      PartitionSpec spec = writeSpec();
+      Expression inclusiveExpr = 
Projections.inclusive(spec).project(conflictDetectionFilter);
+      Evaluator inclusive = new Evaluator(spec.partitionType(), inclusiveExpr);
+
+      InclusiveMetricsEvaluator metrics = new 
InclusiveMetricsEvaluator(base.schema(), conflictDetectionFilter);
+
+      List<DataFile> newFiles = collectNewFiles(base);
+      for (DataFile newFile : newFiles) {
+        ValidationException.check(
+            !inclusive.eval(newFile.partition()) || !metrics.eval(newFile),
+            "A conflicting file was appended that matches filter '%s': %s",
+            conflictDetectionFilter, newFile.path());
+      }
+    }
+
     return super.apply(base);
   }
+
+  private List<DataFile> collectNewFiles(TableMetadata meta) {
+    List<DataFile> newFiles = new ArrayList<>();
+
+    Long currentSnapshotId = meta.currentSnapshot() == null ? null : 
meta.currentSnapshot().snapshotId();
+    while (currentSnapshotId != null && 
!currentSnapshotId.equals(readSnapshotId)) {
 
 Review comment:
   This part is only invoked if we call `validateNoConflictingAppends`, which 
always accepts a valid `readSnapshotId` and an expression for detecting 
conflicts. If `validateNoConflictingAppends` is not called, there will be no 
validation and the behavior would be as it was before this change. 
   
   We will call `validateNoConflictingAppends(null, expr)` only if we start 
with an empty table, meaning that we have to scan the entire history.
   
   We can consider setting `readSnapshotId` and `conflictDetectionFilter` 
independently, but I am not sure whether it gives us any benefits. Also, it 
will lead to correctness errors if we won't set `readSnapshotId` in 
transactions correctly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to