AdamGS commented on code in PR #5181:
URL: https://github.com/apache/arrow-rs/pull/5181#discussion_r1419036780
##########
parquet/src/column/writer/mod.rs:
##########
@@ -764,19 +764,22 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a,
E> {
self.column_metrics.num_column_nulls +=
self.page_metrics.num_page_nulls;
- let page_statistics = match (values_data.min_value,
values_data.max_value) {
- (Some(min), Some(max)) => {
- update_min(&self.descr, &min, &mut
self.column_metrics.min_column_value);
- update_max(&self.descr, &max, &mut
self.column_metrics.max_column_value);
- Some(ValueStatistics::new(
- Some(min),
- Some(max),
- None,
- self.page_metrics.num_page_nulls,
- false,
- ))
- }
- _ => None,
+ let page_statistics = if let (Some(min), Some(max)) =
Review Comment:
There's
[this](https://github.com/apache/arrow-rs/blob/490c080e5ba7a50efc862da9508e6669900549ee/parquet/src/column/writer/mod.rs#L347)
branch to calculate chunk statistics directly when `EnabledStatistics::Chunk`.
I think that just having it as the default (For both `Page` and `Chunk`) will
probably simplify the code as you don't have to keep track of the chunk-level
metadata when adding pages, but it might require a bit more work which is why I
didn't end up going that way.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]