pitrou commented on PR #43705:
URL: https://github.com/apache/arrow/pull/43705#issuecomment-2302096692

   There are various points being made here:
   1. `ArrayData` is mutable: no, it isn't, except when first populating it.
   2. Mutating a `ArrayData` should reset its statistics: I don't get why. For 
example, it doesn't reset its `null_count`.
   3. Statistics would be better on `Datum`: but statistics pertain to an 
Array. If the Datum contains e.g. a RecordBatch, we would probably like to have 
per-column statistics (one per Array), rather than none at all (the null_count 
or max_value of a RecordBatch is ill-defined).
   
   Another question is the cost of adding a statistics structure to either 
`Array` or `ArrayData`. Currently, the `ArrayStatistics` struct is very 
light-weight (with the potential exception of string min/max values). But what 
if some day we add more details statistics such as quantiles?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to