pitrou commented on PR #43705: URL: https://github.com/apache/arrow/pull/43705#issuecomment-2302096692
There are various points being made here: 1. `ArrayData` is mutable: no, it isn't, except when first populating it. 2. Mutating a `ArrayData` should reset its statistics: I don't get why. For example, it doesn't reset its `null_count`. 3. Statistics would be better on `Datum`: but statistics pertain to an Array. If the Datum contains e.g. a RecordBatch, we would probably like to have per-column statistics (one per Array), rather than none at all (the null_count or max_value of a RecordBatch is ill-defined). Another question is the cost of adding a statistics structure to either `Array` or `ArrayData`. Currently, the `ArrayStatistics` struct is very light-weight (with the potential exception of string min/max values). But what if some day we add more details statistics such as quantiles? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
