alamb commented on code in PR #6848:
URL: https://github.com/apache/arrow-rs/pull/6848#discussion_r1881993836


##########
parquet/src/arrow/arrow_reader/statistics.rs:
##########
@@ -1432,6 +1432,24 @@ impl<'a> StatisticsConverter<'a> {
         Ok(UInt64Array::from_iter(null_counts))
     }
 
+    /// Extract the uncompressed sizes from row group statistics in 
[`RowGroupMetaData`]

Review Comment:
   In my opinion:
   - the parquet crate's API already exposes the 
`unencoded_byte_array_data_bytes` metric so users can do arithmetic they want 
so simply adding `unencoded_byte_array_data_bytes` to the `StatisticsConverter` 
is not very helpful (if anything returning `unencoded_byte_array_data_bytes` as 
an arrow array makes the values harder to use)
   - Something I do think could potentially be valuable is some way to 
calculate the memory required for certain amounts of arrow data (e.g. a 100 row 
Int64 array) but that is probably worth its own ticket / discussion 
   
   I suggest proceeding with https://github.com/apache/datafusion/issues/7548 
by adding code there first/ figuring out the real use case and then upstreaming 
(to arrow-rs) and common pattern that emerges



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to