emcake commented on code in PR #5076:
URL: https://github.com/apache/arrow-rs/pull/5076#discussion_r1392798590


##########
parquet/src/file/statistics.rs:
##########
@@ -152,6 +158,12 @@ pub fn from_thrift(
                 stats.max_value
             };
 
+            // Whether or not the min/max values are exact. Due to 
pre-existing truncation
+            // in other libraries such as parquet-mr, we can't assume that any 
given parquet file

Review Comment:
   No, parquet-mr only applies this to binary statistics: 
https://github.com/apache/parquet-mr/pull/696/files#diff-1afc9f89a782ddd4e7cd17546ca048954091627d7a31597ab88892eb2a7a76abR618
   
   Pertaining to the conversation above as well - I could reduce churn by only 
allowing the setting of min/max exactness on the constructors for binary-like 
stats, by splitting the `statistics_new_func` macro into a 
`statistics_new_func_always_exact` and `statistics_new_func_inexact` that 
generates a `binary_with_inexact` method? Given that there's only one place in 
the code in `column/mod.rs` where we set these to something other than `true`, 
would reduce the churn significantly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to