pvary commented on code in PR #16692:
URL: https://github.com/apache/iceberg/pull/16692#discussion_r3379920077
##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java:
##########
@@ -128,6 +130,27 @@ public Boolean or(Boolean leftResult, Boolean rightResult)
{
return leftResult || rightResult;
}
+ @Override
+ @SuppressWarnings("unchecked")
+ public <T> Boolean predicate(BoundPredicate<T> pred) {
+ // A column that is absent from this file but carries an initial-default
reads as the default
+ // for every row, not as null. The per-predicate handlers below assume
an absent column is all
+ // nulls (valueCount == null), which would skip the row group and drop
the backfilled rows
+ // (e.g. for col = <default> or the IsNotNull engines infer). Evaluate
such predicates against
+ // the default value instead. See #16690.
+ if (pred.term() instanceof BoundReference) {
+ int id = ((BoundReference<T>) pred.term()).fieldId();
+ if (!valueCounts.containsKey(id)) {
Review Comment:
What if we just don't have statistics for the file?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]