asolimando commented on PR #20292:
URL: https://github.com/apache/datafusion/pull/20292#issuecomment-3985410056

   I just realized that the parquet files are missing the `distinct_count` 
statistics, would it be possible to updated them to add it?
   
   I am asking this with  #19957 in mind (and future NDV-based improvements), 
it would be nice to be able to measure the accuracy improvement from NDV-aware 
cardinality estimation directly when this PR gets merged.
   
   Of course it's fine to post-pone this in a follow-up PR, asking in case you 
agree it's interesting and not too much work.
   
   (IIRC duckdb allows to write parquet files with `distinct_count` set)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to