crepererum commented on pull request #512:
URL: https://github.com/apache/arrow-rs/pull/512#issuecomment-872779411


   For the distinct count, but also in general for the stats: what's kinda 
unfortunate is that in IOx, we have most of the information available for the 
record batches prior to writing them to parquet. For the min/max values and 
null counts I think it's OK to recompute them, but for the distinct count it 
seems a bit of a waste.
   
   So I would like through some future PR (which I can contribute) have the 
ability to pass through pre-calculated stats.
   
   Furthermore, the "pass through pre-computed stats" might also be a good 
point to find some arrow-type-level representation of the stats, because if you 
wanna currently want consume the stats from parquet, you have to do the scalar 
physical=>logical type conversion yourself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to