dramaticlly commented on PR #8106:
URL: https://github.com/apache/iceberg/pull/8106#issuecomment-1659415919

   discussed with @szehon-ho offline. Looks like this patch aimed to filter 
manifest entries will cause some problem when both column projection on 
`.partitions` column only and partition filter are used together when query 
metadata tables. It will throw nullPointerException when recordCount was 
missing when using MetricsEvaluator.
   
   Behind the scene, when we trying to push down partition filter on 
[ManifestReader](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/ManifestReader.java#L301),
 it filter partition as rows and thus require column stats to evaluate but at 
minimal the record count is required in such metrics evaluator. We tried to go 
through a few options such as add virtual column (`record_count`) on projection 
and removed later when returned, similar to how ReadableMetrics was handled in 
#7364. But this adds a lot complexity in the code without noticeable gain, as 
in-memory filtering of manifestEntry in iceberg is not siginpificantly faster 
as filtering in spark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to