pvary commented on code in PR #16692:
URL: https://github.com/apache/iceberg/pull/16692#discussion_r3362072796
##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetFilters.java:
##########
@@ -51,6 +56,137 @@ static FilterCompat.Filter convert(Schema schema,
Expression expr, boolean caseS
}
}
+ /**
+ * Folds predicates on initial-default columns that are absent from a data
file against the column
+ * default, instead of letting them be applied to the (physically missing,
hence null) column.
+ *
+ * <p>A column added by schema evolution with an {@code initial-default} is
backfilled with the
+ * default at read time, but record-level filtering runs <em>before</em>
that injection. For a
+ * file written before the column existed the record filter would see the
column as null and drop
+ * every row — silently removing exactly the rows the default backfills
(including via the {@code
+ * IsNotNull} that engines infer for null-intolerant predicates). This
evaluates such predicates
+ * against the default value and folds them to {@code alwaysTrue}/{@code
alwaysFalse}, the same
+ * way partition predicates are folded out of the residual. Predicates on
columns the file
+ * actually contains are returned unchanged so that normal record, stats,
dictionary, and bloom
+ * filtering still applies (and still prunes those files on the column's
real values).
+ *
+ * @param expr a residual filter expression
+ * @param expectedSchema the table read schema, whose fields carry
initial-default values
+ * @param fileColumnIds the field ids physically present in the data file
being read
+ * @param caseSensitive whether column resolution is case sensitive
+ * @return the filter with absent initial-default columns folded to their
default value
+ */
+ static Expression replaceMissingColumnDefaults(
Review Comment:
Why is this not part of the `ParquetFilters.convert` method?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]