asolimando commented on PR #20292: URL: https://github.com/apache/datafusion/pull/20292#issuecomment-3985410056
I just realized that the parquet files are missing the `distinct_count` statistics, would it be possible to updated them to add it? I am asking this with #19957 in mind (and future NDV-based improvements), it would be nice to be able to measure the accuracy improvement from NDV-aware cardinality estimation directly when this PR gets merged. Of course it's fine to post-pone this in a follow-up PR, asking in case you agree it's interesting and not too much work. (IIRC duckdb allows to write parquet files with `distinct_count` set) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
