aokolnychyi commented on code in PR #4520:
URL: https://github.com/apache/iceberg/pull/4520#discussion_r845570516


##########
core/src/main/java/org/apache/iceberg/BaseFilesTable.java:
##########
@@ -113,17 +116,19 @@ public TableScan appendsAfter(long fromSnapshotId) {
      */
     protected abstract CloseableIterable<ManifestFile> manifests();
 
-    private CloseableIterable<ManifestFile> 
filterManifests(CloseableIterable<ManifestFile> manifests,
-                                                            Expression 
rowFilter,
-                                                            boolean 
caseSensitive) {
+
+    private boolean filter(ManifestFile manifestFile, Expression rowFilter, 
boolean caseSensitive) {

Review Comment:
   Instead of giving up if the manifest spec ID does not match the current 
table spec, I think we should use the manifest spec to project the row filter 
and push down whatever we can. Remember that the partition type will be a union 
of all partition columns ever used in the table.
   
   Another problem with the current implementation is that we call 
`transformSpec` and create a manifest evaluator for every manifest file. That's 
expensive. We usually use `LoadingCache` to reuse manifest evaluators.
   
   Also, why not use `ManifestEvalutor.forRowFilter` instead of doing the 
projection ourselves?
   
   Will something like this work if we modify `planFiles`?
   
   ```
   Map<Integer, PartitionSpec> specsById = ops.current().specsById();
   
   LoadingCache<Integer, ManifestEvaluator> evalCache = 
Caffeine.newBuilder().build(specId -> {
     PartitionSpec spec = specsById.get(specId);
     PartitionSpec transformedSpec = transformSpec(fileSchema, spec, 
PARTITION_FIELD_PREFIX);
     return ManifestEvaluator.forRowFilter(rowFilter, transformedSpec, 
caseSensitive);
   });
   
   CloseableIterable<ManifestFile> filtered = CloseableIterable.filter(
       manifests(),
       manifest -> evalCache.get(manifest.partitionSpecId()).eval(manifest));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to