dramaticlly commented on PR #8106: URL: https://github.com/apache/iceberg/pull/8106#issuecomment-1659415919
discussed with @szehon-ho offline. Looks like this patch aimed to filter manifest entries will cause some problem when both column projection on `.partitions` column only and partition filter are used together when query metadata tables. It will throw nullPointerException when recordCount was missing when using MetricsEvaluator. Behind the scene, when we trying to push down partition filter on [ManifestReader](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/ManifestReader.java#L301), it filter partition as rows and thus require column stats to evaluate but at minimal the record count is required in such metrics evaluator. We tried to go through a few options such as add virtual column (`record_count`) on projection and removed later when returned, similar to how ReadableMetrics was handled in #7364. But this adds a lot complexity in the code without noticeable gain, as in-memory filtering of manifestEntry in iceberg is not siginpificantly faster as filtering in spark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
