nastra commented on code in PR #13222:
URL: https://github.com/apache/iceberg/pull/13222#discussion_r2191655872
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -224,7 +235,9 @@ List<ManifestFile> filterManifests(Schema tableSchema,
List<ManifestFile> manife
private boolean canTrustManifestReferences(List<ManifestFile> manifests) {
Set<String> manifestLocations =
manifests.stream().map(ManifestFile::path).collect(Collectors.toSet());
- return allDeletesReferenceManifests &&
manifestLocations.containsAll(manifestsWithDeletes);
Review Comment:
A good example that makes it clearer is when running
`TestDeleteFiles.removingDataFileByPathAlsoRemovesDV` (or any other new tests
that I added). When we hit `canTrustManifestReferences()` the
`allDeletesReferenceManifests` flag is set to `true`, the `manifestLocations`
contain one entry but `manifestsWithDeletes` is actually empty.
This results in `canTrustManifestReferences` being evaluated to `true`,
which will then exit early in
https://github.com/apache/iceberg/blob/3e6decc0bae1ca6ba875170036d151730c8b287b/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L357-L360
which is because it also exited early in
https://github.com/apache/iceberg/blob/3e6decc0bae1ca6ba875170036d151730c8b287b/core/src/main/java/org/apache/iceberg/ManifestFilterManager.java#L385-L387
The issue there is that `manifestsWithDeletes` in that line is still empty
and so this condition there can never be `true`. IMO `trustManifestReferences`
should be `false` in that case because `manifestsWithDeletes` was empty.
> The above block would always be false when allDeletesReferenceManifests is
true, as any of these conditions would render allDeletesReferenceManifests
false.
Not necessarily. While this might be true for the `DataFileFilterManager`
it's slightly different for the `DeleteFileFilterManager`. Just assume you're
deleting a DataFile, which renders `allDeletesReferenceManifests` to `false` in
`DataFileFilterManager`. In the `DeleteFileFilterManager`
`allDeletesReferenceManifests` is `true` by default.
I haven't explored this, but maybe we would need to set
`allDeletesReferenceManifests = false` as well when
`removeDanglingDeletesFor()` is being called but I think adding this additional
non-empty check here still makes sense to me.
I'll go ahead and merge the PR to unblock the release. If necessary we can
follow up here afterwards.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]