szehon-ho commented on code in PR #4520:
URL: https://github.com/apache/iceberg/pull/4520#discussion_r845760260


##########
core/src/main/java/org/apache/iceberg/BaseFilesTable.java:
##########
@@ -92,7 +94,26 @@ public TableScan appendsAfter(long fromSnapshotId) {
     protected CloseableIterable<FileScanTask> planFiles(TableOperations ops, 
Snapshot snapshot, Expression rowFilter,
                                                         boolean 
ignoreResiduals, boolean caseSensitive,
                                                         boolean colStats) {
-      CloseableIterable<ManifestFile> filtered = filterManifests(manifests(), 
rowFilter, caseSensitive);
+      Map<Integer, PartitionSpec> specsById = table().specs();
+
+      LoadingCache<Integer, ManifestEvaluator> evalCache = 
Caffeine.newBuilder().build(specId -> {
+        PartitionSpec spec = specsById.get(specId);
+        PartitionSpec transformedSpec = transformSpec(fileSchema, spec, 
PARTITION_FIELD_PREFIX);
+        return ManifestEvaluator.forRowFilter(rowFilter, transformedSpec, 
caseSensitive);
+      });
+
+      CloseableIterable<ManifestFile> filtered = CloseableIterable.filter(
+          manifests(),
+          manifest -> {
+            PartitionSpec spec = specsById.get(manifest.partitionSpecId());
+
+            if (spec.fields().stream().anyMatch(f -> 
f.transform().equals(Transforms.alwaysNull()))) {

Review Comment:
   The reason for this ugly block is V1 tables where partition fields are 
removed (see TestMetadataTableFilters.testPartitionSpecEvolutionRemovalV1).
   
   In V1 tables, removing partition fields replaces the transform with a "void" 
transform.  What happens is that when we use this spec to evaluate a manifest 
file, it checks whether the manifest's partition value for that removed key is 
null, which is incorrect.
   
   Example, if we remove a field "data_bucket" from partition spec, then this 
code would check whether manifest files have null "data_bucket" values, and in 
the test would filter out all the newer manifests incorrectly.  So hence 
skipping out of this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to