[GitHub] [iceberg] rdblue commented on a change in pull request #1566: Parquet: Support Page Skipping in Iceberg Parquet Reader

GitBox Tue, 03 Nov 2020 11:23:19 -0800


rdblue commented on a change in pull request #1566:
URL: https://github.com/apache/iceberg/pull/1566#discussion_r516902119




##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java
##########
@@ -623,17 +630,10 @@ public ReadBuilder withNameMapping(NameMapping 
newNameMapping) {
       if (filter != null) {
         // TODO: should not need to get the schema to push down before opening 
the file.
         // Parquet should allow setting a filter inside its read support
-        MessageType type;

Review comment:
       The comment above sounds like another possible reason why we wanted to 
reimplement filters. Tailoring the filter to each file is difficult, compared 
to evaluating the same filter for a file.
   
   If I remember correctly, the need to tailor the filter for the file is 
because we use id-based column resolution. So the file might contain `1: a int` 
and Iceberg has a filter for `1: x long`. Unless the filter is translated to 
use `a` instead of `x`, Parquet will skip the file because it doesn't think the 
column exists (and is all nulls).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1566: Parquet: Support Page Skipping in Iceberg Parquet Reader

Reply via email to