tustvold commented on code in PR #5181:
URL: https://github.com/apache/arrow-rs/pull/5181#discussion_r1418970317


##########
parquet/src/column/writer/mod.rs:
##########
@@ -764,19 +764,22 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, 
E> {
 
         self.column_metrics.num_column_nulls += 
self.page_metrics.num_page_nulls;
 
-        let page_statistics = match (values_data.min_value, 
values_data.max_value) {
-            (Some(min), Some(max)) => {
-                update_min(&self.descr, &min, &mut 
self.column_metrics.min_column_value);
-                update_max(&self.descr, &max, &mut 
self.column_metrics.max_column_value);
-                Some(ValueStatistics::new(
-                    Some(min),
-                    Some(max),
-                    None,
-                    self.page_metrics.num_page_nulls,
-                    false,
-                ))
-            }
-            _ => None,
+        let page_statistics = if let (Some(min), Some(max)) =

Review Comment:
   Looking at the code it would appear the way this was intended to work is for 
the encoder itself to not compute page statistics unless 
EnabledStatistics::Page. If `EnabledStatistics::Chunk` the column stats are 
computed at a higher-level in write_batch_internal. What do you think about 
pushing this condition into `ByteArrayEncoder` like it is for 
`ColumnValueEncoderImpl`?
   
   Edit: I'll have a play to see if I can't simplify this as a follow up, the 
current behaviour is rather confusing imo



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to