szehon-ho commented on code in PR #4520:
URL: https://github.com/apache/iceberg/pull/4520#discussion_r845760260
##########
core/src/main/java/org/apache/iceberg/BaseFilesTable.java:
##########
@@ -92,7 +94,26 @@ public TableScan appendsAfter(long fromSnapshotId) {
protected CloseableIterable<FileScanTask> planFiles(TableOperations ops,
Snapshot snapshot, Expression rowFilter,
boolean
ignoreResiduals, boolean caseSensitive,
boolean colStats) {
- CloseableIterable<ManifestFile> filtered = filterManifests(manifests(),
rowFilter, caseSensitive);
+ Map<Integer, PartitionSpec> specsById = table().specs();
+
+ LoadingCache<Integer, ManifestEvaluator> evalCache =
Caffeine.newBuilder().build(specId -> {
+ PartitionSpec spec = specsById.get(specId);
+ PartitionSpec transformedSpec = transformSpec(fileSchema, spec,
PARTITION_FIELD_PREFIX);
+ return ManifestEvaluator.forRowFilter(rowFilter, transformedSpec,
caseSensitive);
+ });
+
+ CloseableIterable<ManifestFile> filtered = CloseableIterable.filter(
+ manifests(),
+ manifest -> {
+ PartitionSpec spec = specsById.get(manifest.partitionSpecId());
+
+ if (spec.fields().stream().anyMatch(f ->
f.transform().equals(Transforms.alwaysNull()))) {
Review Comment:
The reason for this ugly block is V1 tables where partition fields are
removed (see TestMetadataTableFilters.testPartitionSpecEvolutionRemovalV1).
In V1 tables, removing partition fields replaces the transform with a "void"
transform. What happens is that when we use this spec on an expression against
a removed partition field to evaluate a manifest file, it checks whether the
manifest's partition value for that removed key is null, which is incorrect.
Example, if we remove a field "data_bucket" from partition spec, then this
code would check whether manifest files have null "data_bucket" values, and in
the test would filter out all the newer manifests incorrectly. So hence
skipping out of this case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]