Fokko commented on issue #7022: URL: https://github.com/apache/iceberg/issues/7022#issuecomment-1456662225
Did some extensive digging into this today. And it looks like the filter operation returns residuals; that means that row groups that may contain valid rows are read as a whole. Fixing this requires quite a bit of an overhaul of the code. Currently, we read everything directly using `FlinkParquetReaders`, and we don't filter currently. If the column isn't part of the requested schema, then we can't filter afterward, so we have to make sure that the columns are included, or skip directly while reading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
