stevenzwu commented on code in PR #13222:
URL: https://github.com/apache/iceberg/pull/13222#discussion_r2183270318


##########
core/src/test/java/org/apache/iceberg/TestRewriteFiles.java:
##########
@@ -777,4 +778,40 @@ public void testNewDeleteFile() {
             .rewriteFiles(Sets.newSet(FILE_A), Sets.newSet(FILE_A2)),
         branch);
   }
+
+  @TestTemplate
+  public void deleteDataFileAlsoRemovesDV() {

Review Comment:
   nit: delete -> remove



##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -155,6 +161,11 @@ void caseSensitive(boolean newCaseSensitive) {
     this.caseSensitive = newCaseSensitive;
   }
 
+  protected void removeDanglingDeletesFor(Set<DataFile> dataFiles) {

Review Comment:
   should we move this inside `DeleteFileFilterManager` since it is only 
applicable there?
   
   In the `MergingSnapshotProducer`, we can use the specific data and delete 
classes instead of the base class `ManifestFilterManager`



##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -452,6 +468,11 @@ private boolean manifestHasDeletedFiles(
     return false;
   }
 
+  private boolean isDanglingDV(DeleteFile file) {
+    return ContentFileUtil.isDV(file)
+        && dataFilePathsWithDanglingDVs.contains(file.referencedDataFile());

Review Comment:
   `removedDataFilePaths` seems more accurate. They may or may not have 
dandling DVs



##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -1130,6 +1132,11 @@ protected ManifestReader<DataFile> 
newManifestReader(ManifestFile manifest) {
     protected Set<DataFile> newFileSet() {
       return DataFileSet.create();
     }
+
+    @Override

Review Comment:
   I am confused here. is this method defined in the base class 
`SnapshotProducer`?



##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -920,6 +920,8 @@ protected Map<String, String> summary() {
 
   @Override
   public List<ManifestFile> apply(TableMetadata base, Snapshot snapshot) {
+    Set<DataFile> filesToBeDeleted = filterManager.filesToBeDeleted();

Review Comment:
   is this enough? this will include explicitly delete files via `delete(F 
file)`. but it won't include the data files removed via `deleteExpression` or 
`dropParitition, which are evaluated in the `filterManifests` step in line 927



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to