nastra commented on code in PR #13222:
URL: https://github.com/apache/iceberg/pull/13222#discussion_r2191655872


##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -224,7 +235,9 @@ List<ManifestFile> filterManifests(Schema tableSchema, 
List<ManifestFile> manife
   private boolean canTrustManifestReferences(List<ManifestFile> manifests) {
     Set<String> manifestLocations =
         manifests.stream().map(ManifestFile::path).collect(Collectors.toSet());
-    return allDeletesReferenceManifests && 
manifestLocations.containsAll(manifestsWithDeletes);

Review Comment:
   A good example that makes it clearer is when running 
`TestDeleteFiles.removingDataFileByPathAlsoRemovesDV` (or any other new tests 
that I added). When we hit `canTrustManifestReferences()` the 
`allDeletesReferenceManifests` flag is set to `true`, the `manifestLocations` 
contain one entry but `manifestsWithDeletes` is actually empty. 
   This results in `canTrustManifestReferences` being evaluated to `true`, 
which will then exit early in 
https://github.com/apache/iceberg/blob/3e6decc0bae1ca6ba875170036d151730c8b287b/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L357-L360
   which is because it also exited early in 
https://github.com/apache/iceberg/blob/3e6decc0bae1ca6ba875170036d151730c8b287b/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L385-L387
 
   The issue there is that `manifestsWithDeletes` in that line is still empty 
and so this condition there can never be `true`. IMO `trustManifestReferences` 
should be `false` in that case because `manifestsWithDeletes` was empty.
   
   
   > The above block would always be false when allDeletesReferenceManifests is 
true, as any of these conditions would render allDeletesReferenceManifests 
false.
   
   Not necessarily. While this might be true for the `DataFileFilterManager` 
it's slightly different for the `DeleteFileFilterManager`. Just assume you're 
deleting a DataFile, which renders `allDeletesReferenceManifests` to `false` in 
`DataFileFilterManager`.  In the `DeleteFileFilterManager` 
`allDeletesReferenceManifests` is `true` by default. 
   I haven't explored this, but maybe we would need to set 
`allDeletesReferenceManifests = false` as well when 
`removeDanglingDeletesFor()` is being called but I think adding this additional 
non-empty check here still makes sense to me.
   
   I'll go ahead and merge the PR to unblock the release. If necessary we can 
follow up here afterwards.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to