rdblue commented on pull request #1566:
URL: https://github.com/apache/iceberg/pull/1566#issuecomment-721317727
There are a few reasons why Iceberg reimplemented the filters.
* Faster iteration and more features without needing Parquet releases
* Parquet's filter API has some problems
* Evaluation is negated (`canDrop` vs `shouldRead`), which has led to more
bugs
* It is missing some predicates that we need to be well supported, like
`startsWith`, `in`, `alwaysTrue`, and `alwaysFalse`
* The API is very difficult to work with
* Iceberg replaces record materialization, so we would need to run these
filters from Iceberg code anyway
* Iceberg had already implemented similar filters, like stats evaluation, so
it was simple to reuse that code
To fix some of the issues with the Parquet API, my hope was that eventually
Parquet would use Iceberg's expression API and filters in place of its own.
We'd need to refactor a bit to make this happen, but I think it would still be
a good option. There are several things that I think would be great to
standardize across some of the storage projects like the `FileIO` classes and
the expressions.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]