openinx commented on a change in pull request #2865:
URL: https://github.com/apache/iceberg/pull/2865#discussion_r679711940



##########
File path: core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java
##########
@@ -85,4 +90,18 @@ public RewriteFiles rewriteFiles(Set<DataFile> 
dataFilesToDelete, Set<DeleteFile
 
     return this;
   }
+
+  @Override
+  public RewriteFiles validateFromSnapshot(long snapshotId) {
+    this.startingSnapshotId = snapshotId;
+    return this;
+  }
+
+  @Override
+  protected void validate(TableMetadata base) {
+    if (replacedDataFiles.size() > 0) {
+      // if there are replaced data files, there cannot be any new row-level 
deletes for those data files
+      validateNoNewDeletesForDataFiles(base, startingSnapshotId, 
replacedDataFiles);

Review comment:
       > We could also build one that compacts just data files, but in order 
for that operation to be correct it would also need to rewrite delete files 
(that are pointing to the compacted data files) or use an older sequence number 
with equality deletes
   
   You mean people will need to know whether the  data-files-only compaction 
will break the existing data set ? This puts too much burden on users ?  I 
think we still need to provide the validation because we developers are sure 
that those cases will break the data correctness ...  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to