AdamGS commented on code in PR #6848:
URL: https://github.com/apache/arrow-rs/pull/6848#discussion_r1878143554
##########
parquet/src/arrow/arrow_reader/statistics.rs:
##########
@@ -1432,6 +1432,24 @@ impl<'a> StatisticsConverter<'a> {
Ok(UInt64Array::from_iter(null_counts))
}
+ /// Extract the uncompressed sizes from row group statistics in
[`RowGroupMetaData`]
Review Comment:
My current plan is to report `unencoded_byte_array_data_bytes` for
`BYTE_ARRAY` columns, and width * num_values for the others in my mind is the
amount of "information stored".
Consumers like DataFusion can then add any known overheads (like Arrow
offset arrays etc).
The other option I can think of is reporting the value size, and letting
callers do any arithmetic the find useful (like multiplying by number of values
etc.), would love to hear your thoughts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]