Fokko commented on issue #7022: URL: https://github.com/apache/iceberg/issues/7022#issuecomment-1464999508
For other engines, the source also filters down to a row level, instead of a row group level as it is now. I think we need to do a few things: - When projecting the schema that's being passed down to the parquet reader, make sure that the fields that that's being filtered on, are being read. - Read as we do now, which will do the partition pruning, metrics evaluation, and row group filtering. - Filter using Flink since it is probably heavily optimized (and we don't want to reinvite the wheel here). - Do a final projection to the requested schema that will exclude the fields that are part of the filter, but not part of the selected columns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
