alamb commented on code in PR #5183:
URL: https://github.com/apache/arrow-rs/pull/5183#discussion_r1420758889


##########
parquet/src/column/writer/mod.rs:
##########
@@ -329,28 +329,11 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, 
E> {
             None => values.len(),
         };
 
-        // If only computing chunk-level statistics compute them here, 
page-level statistics
-        // are computed in [`Self::write_mini_batch`] and used to update chunk 
statistics in
-        // [`Self::add_data_page`]
-        if self.statistics_enabled == EnabledStatistics::Chunk

Review Comment:
   Are there any performance implications of always computing the statistics 
even when they aren't written to the page?
   
   Though it seems like if we are writing column chunk level statistics anyways 
we would have to calculate the min/max across all values anyways, so tracking 
per page shouldn't be any more expensive 
   
   



##########
parquet/src/column/writer/mod.rs:
##########
@@ -329,28 +329,11 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, 
E> {
             None => values.len(),
         };
 
-        // If only computing chunk-level statistics compute them here, 
page-level statistics
-        // are computed in [`Self::write_mini_batch`] and used to update chunk 
statistics in
-        // [`Self::add_data_page`]
-        if self.statistics_enabled == EnabledStatistics::Chunk
-            // INTERVAL has undefined sort order, so don't write min/max stats 
for it
-            && self.descr.converted_type() != ConvertedType::INTERVAL
-        {
-            match (min, max) {
-                (Some(min), Some(max)) => {
-                    update_min(&self.descr, min, &mut 
self.column_metrics.min_column_value);
-                    update_max(&self.descr, max, &mut 
self.column_metrics.max_column_value);
-                }
-                (None, Some(_)) | (Some(_), None) => {
-                    panic!("min/max should be both set or both None")
-                }
-                (None, None) => {
-                    if let Some((min, max)) = self.encoder.min_max(values, 
value_indices) {
-                        update_min(&self.descr, &min, &mut 
self.column_metrics.min_column_value);
-                        update_max(&self.descr, &max, &mut 
self.column_metrics.max_column_value);
-                    }
-                }
-            };
+        if let Some(min) = min {

Review Comment:
   This min is only passed in when precomputed statistics are used 
(`write_batch_with_statistics`). Is that correct? I couldn't find another 
callsite where `min` and `max` were non zero



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to