openinx commented on a change in pull request #2865:
URL: https://github.com/apache/iceberg/pull/2865#discussion_r679711940
##########
File path: core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java
##########
@@ -85,4 +90,18 @@ public RewriteFiles rewriteFiles(Set<DataFile>
dataFilesToDelete, Set<DeleteFile
return this;
}
+
+ @Override
+ public RewriteFiles validateFromSnapshot(long snapshotId) {
+ this.startingSnapshotId = snapshotId;
+ return this;
+ }
+
+ @Override
+ protected void validate(TableMetadata base) {
+ if (replacedDataFiles.size() > 0) {
+ // if there are replaced data files, there cannot be any new row-level
deletes for those data files
+ validateNoNewDeletesForDataFiles(base, startingSnapshotId,
replacedDataFiles);
Review comment:
> We could also build one that compacts just data files, but in order
for that operation to be correct it would also need to rewrite delete files
(that are pointing to the compacted data files) or use an older sequence number
with equality deletes
You mean people will need to know whether the data-files-only compaction
will break the existing data set ? This puts too much burden on users ? I
think we still need to provide the validation because we developers are sure
that those cases will break the data correctness ...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]