[GitHub] [iceberg] Fokko commented on issue #7022: Function filters() is unless on flink datastream api when iceberg table is stored by parquet format

via GitHub Mon, 06 Mar 2023 10:02:26 -0800


Fokko commented on issue #7022:
URL: https://github.com/apache/iceberg/issues/7022#issuecomment-1456662225


   Did some extensive digging into this today. And it looks like the filter 
operation returns residuals; that means that row groups that may contain valid 
rows are read as a whole. Fixing this requires quite a bit of an overhaul of 
the code. 
   
   Currently, we read everything directly using `FlinkParquetReaders`, and we 
don't filter currently. If the column isn't part of the requested schema, then 
we can't filter afterward, so we have to make sure that the columns are 
included, or skip directly while reading.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Fokko commented on issue #7022: Function filters() is unless on flink datastream api when iceberg table is stored by parquet format

Reply via email to