rdblue commented on a change in pull request #1566:
URL: https://github.com/apache/iceberg/pull/1566#discussion_r516902119
##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java
##########
@@ -623,17 +630,10 @@ public ReadBuilder withNameMapping(NameMapping
newNameMapping) {
if (filter != null) {
// TODO: should not need to get the schema to push down before opening
the file.
// Parquet should allow setting a filter inside its read support
- MessageType type;
Review comment:
The comment above sounds like another possible reason why we wanted to
reimplement filters. Tailoring the filter to each file is difficult, compared
to evaluating the same filter for a file.
If I remember correctly, the need to tailor the filter for the file is
because we use id-based column resolution. So the file might contain `1: a int`
and Iceberg has a filter for `1: x long`. Unless the filter is translated to
use `a` instead of `x`, Parquet will skip the file because it doesn't think the
column exists (and is all nulls).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]