[GitHub] [iceberg] szehon-ho opened a new pull request, #4637: Core: Fix Partitions table Filtering for Evolved Partition Specs

GitBox Tue, 26 Apr 2022 16:52:41 -0700


szehon-ho opened a new pull request, #4637:
URL: https://github.com/apache/iceberg/pull/4637


   The Partitions metadata table predicate logic is always using the current 
table's partition spec and not the old ones.  This would lead to errors if the 
table has different specs and a filter is applied to partitions table, like
   
   ```
   Cannot find field 'data' in struct: struct<>
   org.apache.iceberg.exceptions.ValidationException: Cannot find field 'data' 
in struct: struct<>
        at 
app//org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:50)
        at 
app//org.apache.iceberg.expressions.NamedReference.bind(NamedReference.java:46)
        at 
app//org.apache.iceberg.expressions.NamedReference.bind(NamedReference.java:27)
        at 
app//org.apache.iceberg.expressions.UnboundPredicate.bind(UnboundPredicate.java:106)
        at 
app//org.apache.iceberg.expressions.Binder$BindVisitor.predicate(Binder.java:145)
        at 
app//org.apache.iceberg.expressions.Binder$BindVisitor.predicate(Binder.java:104)
        at 
app//org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:330)
        at app//org.apache.iceberg.expressions.Binder.bind(Binder.java:62)
        at 
app//org.apache.iceberg.expressions.ManifestEvaluator.<init>(ManifestEvaluator.java:68)
        at 
app//org.apache.iceberg.expressions.ManifestEvaluator.forPartitionFilter(ManifestEvaluator.java:63)
        at 
app//org.apache.iceberg.ManifestGroup.lambda$entries$9(ManifestGroup.java:209)
        at 
app//com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$2(LocalLoadingCache.java:141)
        at 
app//com.github.benmanes.caffeine.cache.UnboundedLocalCache.lambda$computeIfAbsent$2(UnboundedLocalCache.java:238)
        at 
[email protected]/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
        at 
app//com.github.benmanes.caffeine.cache.UnboundedLocalCache.computeIfAbsent(UnboundedLocalCache.java:234)
        at 
app//com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
        at 
app//com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54)
        at 
app//org.apache.iceberg.ManifestGroup.lambda$entries$10(ManifestGroup.java:222)
        at 
app//org.apache.iceberg.relocated.com.google.common.collect.Iterators$5.computeNext(Iterators.java:670)
        at 
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
        at 
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
        at 
app//org.apache.iceberg.relocated.com.google.common.collect.Iterators$5.computeNext(Iterators.java:668)
        at 
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
        at 
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
        at 
app//org.apache.iceberg.relocated.com.google.common.collect.Iterators$5.computeNext(Iterators.java:668)
        at 
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
        at 
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
        at 
app//org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:46)
        at 
app//org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:46)
        at 
app//org.apache.iceberg.util.ParallelIterable$ParallelIterator.submitNextTask(ParallelIterable.java:130)
        at 
app//org.apache.iceberg.util.ParallelIterable$ParallelIterator.checkTasks(ParallelIterable.java:118)
        at 
app//org.apache.iceberg.util.ParallelIterable$ParallelIterator.hasNext(ParallelIterable.java:155)
        at 
app//org.apache.iceberg.PartitionsTable.partitions(PartitionsTable.java:106)
        at app//org.apache.iceberg.PartitionsTable.task(PartitionsTable.java:77)
        at 
app//org.apache.iceberg.PartitionsTable.access$400(PartitionsTable.java:36)
        at 
app//org.apache.iceberg.PartitionsTable$PartitionsScan.lambda$new$0(PartitionsTable.java:187)
        at 
app//org.apache.iceberg.StaticTableScan.doPlanFiles(StaticTableScan.java:47)
        at 
app//org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:195)
        at 
app//org.apache.iceberg.spark.source.SparkBatchQueryScan.files(SparkBatchQueryScan.java:114)
        at 
app//org.apache.iceberg.spark.source.SparkBatchQueryScan.tasks(SparkBatchQueryScan.java:128)
        at 
app//org.apache.iceberg.spark.source.SparkScan.toBatch(SparkScan.java:108)
   ```
   
   This pr fixes this issue by making a cache of specs and ManifestEvaluators, 
and using it in the filtering.  Similar to This is similar to 
https://github.com/apache/iceberg/pull/4520.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] szehon-ho opened a new pull request, #4637: Core: Fix Partitions table Filtering for Evolved Partition Specs

Reply via email to