rdblue commented on a change in pull request #351: Extend Iceberg with a way to overwrite files for eager updates/deletes URL: https://github.com/apache/incubator-iceberg/pull/351#discussion_r315464103
########## File path: core/src/main/java/org/apache/iceberg/OverwriteData.java ########## @@ -88,6 +119,42 @@ public OverwriteFiles validateAddedFiles() { } } + if (conflictDetectionFilter != null) { + PartitionSpec spec = writeSpec(); + Expression inclusiveExpr = Projections.inclusive(spec).project(conflictDetectionFilter); + Evaluator inclusive = new Evaluator(spec.partitionType(), inclusiveExpr); + + InclusiveMetricsEvaluator metrics = new InclusiveMetricsEvaluator(base.schema(), conflictDetectionFilter); + + List<DataFile> newFiles = collectNewFiles(base); + for (DataFile newFile : newFiles) { + ValidationException.check( + !inclusive.eval(newFile.partition()) || !metrics.eval(newFile), + "A conflicting file was appended that matches filter '%s': %s", + conflictDetectionFilter, newFile.path()); + } + } + return super.apply(base); } + + private List<DataFile> collectNewFiles(TableMetadata meta) { + List<DataFile> newFiles = new ArrayList<>(); + + Long currentSnapshotId = meta.currentSnapshot() == null ? null : meta.currentSnapshot().snapshotId(); + while (currentSnapshotId != null && !currentSnapshotId.equals(readSnapshotId)) { Review comment: If `readSnapshotId` is null, then this use the _entire_ table history? Or am I missing where `readSnapshotId` is defaulted to the current snapshot ID when the operation starts? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org