tustvold commented on code in PR #5181:
URL: https://github.com/apache/arrow-rs/pull/5181#discussion_r1418970317
##########
parquet/src/column/writer/mod.rs:
##########
@@ -764,19 +764,22 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a,
E> {
self.column_metrics.num_column_nulls +=
self.page_metrics.num_page_nulls;
- let page_statistics = match (values_data.min_value,
values_data.max_value) {
- (Some(min), Some(max)) => {
- update_min(&self.descr, &min, &mut
self.column_metrics.min_column_value);
- update_max(&self.descr, &max, &mut
self.column_metrics.max_column_value);
- Some(ValueStatistics::new(
- Some(min),
- Some(max),
- None,
- self.page_metrics.num_page_nulls,
- false,
- ))
- }
- _ => None,
+ let page_statistics = if let (Some(min), Some(max)) =
Review Comment:
Looking at the code it would appear the way this was intended to work is for
the encoder itself to not compute page statistics unless
EnabledStatistics::Page. If `EnabledStatistics::Chunk` the column stats are
computed at a higher-level in write_batch_internal. What do you think about
pushing this condition into `ByteArrayEncoder` like it is for
`ColumnValueEncoderImpl`?
Edit: I'll have a play to see if I can't simplify this as a follow up, the
current behaviour is rather confusing imo
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]