szehon-ho opened a new pull request, #4637:
URL: https://github.com/apache/iceberg/pull/4637
The Partitions metadata table predicate logic is always using the current
table's partition spec and not the old ones. This would lead to errors if the
table has different specs and a filter is applied to partitions table, like
```
Cannot find field 'data' in struct: struct<>
org.apache.iceberg.exceptions.ValidationException: Cannot find field 'data'
in struct: struct<>
at
app//org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:50)
at
app//org.apache.iceberg.expressions.NamedReference.bind(NamedReference.java:46)
at
app//org.apache.iceberg.expressions.NamedReference.bind(NamedReference.java:27)
at
app//org.apache.iceberg.expressions.UnboundPredicate.bind(UnboundPredicate.java:106)
at
app//org.apache.iceberg.expressions.Binder$BindVisitor.predicate(Binder.java:145)
at
app//org.apache.iceberg.expressions.Binder$BindVisitor.predicate(Binder.java:104)
at
app//org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:330)
at app//org.apache.iceberg.expressions.Binder.bind(Binder.java:62)
at
app//org.apache.iceberg.expressions.ManifestEvaluator.<init>(ManifestEvaluator.java:68)
at
app//org.apache.iceberg.expressions.ManifestEvaluator.forPartitionFilter(ManifestEvaluator.java:63)
at
app//org.apache.iceberg.ManifestGroup.lambda$entries$9(ManifestGroup.java:209)
at
app//com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$2(LocalLoadingCache.java:141)
at
app//com.github.benmanes.caffeine.cache.UnboundedLocalCache.lambda$computeIfAbsent$2(UnboundedLocalCache.java:238)
at
[email protected]/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
at
app//com.github.benmanes.caffeine.cache.UnboundedLocalCache.computeIfAbsent(UnboundedLocalCache.java:234)
at
app//com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
at
app//com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54)
at
app//org.apache.iceberg.ManifestGroup.lambda$entries$10(ManifestGroup.java:222)
at
app//org.apache.iceberg.relocated.com.google.common.collect.Iterators$5.computeNext(Iterators.java:670)
at
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
at
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
at
app//org.apache.iceberg.relocated.com.google.common.collect.Iterators$5.computeNext(Iterators.java:668)
at
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
at
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
at
app//org.apache.iceberg.relocated.com.google.common.collect.Iterators$5.computeNext(Iterators.java:668)
at
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
at
app//org.apache.iceberg.relocated.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
at
app//org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:46)
at
app//org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:46)
at
app//org.apache.iceberg.util.ParallelIterable$ParallelIterator.submitNextTask(ParallelIterable.java:130)
at
app//org.apache.iceberg.util.ParallelIterable$ParallelIterator.checkTasks(ParallelIterable.java:118)
at
app//org.apache.iceberg.util.ParallelIterable$ParallelIterator.hasNext(ParallelIterable.java:155)
at
app//org.apache.iceberg.PartitionsTable.partitions(PartitionsTable.java:106)
at app//org.apache.iceberg.PartitionsTable.task(PartitionsTable.java:77)
at
app//org.apache.iceberg.PartitionsTable.access$400(PartitionsTable.java:36)
at
app//org.apache.iceberg.PartitionsTable$PartitionsScan.lambda$new$0(PartitionsTable.java:187)
at
app//org.apache.iceberg.StaticTableScan.doPlanFiles(StaticTableScan.java:47)
at
app//org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:195)
at
app//org.apache.iceberg.spark.source.SparkBatchQueryScan.files(SparkBatchQueryScan.java:114)
at
app//org.apache.iceberg.spark.source.SparkBatchQueryScan.tasks(SparkBatchQueryScan.java:128)
at
app//org.apache.iceberg.spark.source.SparkScan.toBatch(SparkScan.java:108)
```
This pr fixes this issue by making a cache of specs and ManifestEvaluators,
and using it in the filtering. Similar to This is similar to
https://github.com/apache/iceberg/pull/4520.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]