[GitHub] [iceberg] dramaticlly commented on pull request #8106: Core: push down filters when evaluating entries in metadata tables

via GitHub Mon, 31 Jul 2023 18:09:50 -0700


dramaticlly commented on PR #8106:
URL: https://github.com/apache/iceberg/pull/8106#issuecomment-1659415919

discussed with @szehon-ho offline. Looks like this patch aimed to filter
manifest entries will cause some problem when both column projection on
`.partitions` column only and partition filter are used together when query
metadata tables. It will throw nullPointerException when recordCount was
missing when using MetricsEvaluator.

Behind the scene, when we trying to push down partition filter on
[ManifestReader](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/ManifestReader.java#L301),
it filter partition as rows and thus require column stats to evaluate but at
minimal the record count is required in such metrics evaluator. We tried to go
through a few options such as add virtual column (`record_count`) on projection
and removed later when returned, similar to how ReadableMetrics was handled in
#7364. But this adds a lot complexity in the code without noticeable gain, as
in-memory filtering of manifestEntry in iceberg is not siginpificantly faster
as filtering in spark.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] dramaticlly commented on pull request #8106: Core: push down filters when evaluating entries in metadata tables

Reply via email to