Dandandan commented on pull request #512:
URL: https://github.com/apache/arrow-rs/pull/512#issuecomment-871723163


   Distinct count AFAIK is often not included for parquet stats as calculating 
it is expensive.
   
   The distinct count calculation in DataFusion is not really optimized yet 
(and quite high in memory usage), so not sure whether that's super useful for 
Arrow to use.
   
   Also for DataFusion it would need to be over multiple arrays whether maybe 
in arrow it can be for one array? I think it would be great to have some kernel 
that can be used by DataFusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to