[GitHub] [iceberg] Fokko commented on issue #7022: Flink: filters not applied at row-level for non-partition columns

via GitHub Sat, 11 Mar 2023 11:55:04 -0800


Fokko commented on issue #7022:
URL: https://github.com/apache/iceberg/issues/7022#issuecomment-1464999508


   For other engines, the source also filters down to a row level, instead of a 
row group level as it is now.
   
   I think we need to do a few things:
   
   - When projecting the schema that's being passed down to the parquet reader, 
make sure that the fields that that's being filtered on, are being read.
   - Read as we do now, which will do the partition pruning, metrics 
evaluation, and row group filtering.
   - Filter using Flink since it is probably heavily optimized (and we don't 
want to reinvite the wheel here).
   - Do a final projection to the requested schema that will exclude the fields 
that are part of the filter, but not part of the selected columns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Fokko commented on issue #7022: Flink: filters not applied at row-level for non-partition columns

Reply via email to