emcake commented on code in PR #5076:
URL: https://github.com/apache/arrow-rs/pull/5076#discussion_r1392798590
##########
parquet/src/file/statistics.rs:
##########
@@ -152,6 +158,12 @@ pub fn from_thrift(
stats.max_value
};
+ // Whether or not the min/max values are exact. Due to
pre-existing truncation
+ // in other libraries such as parquet-mr, we can't assume that any
given parquet file
Review Comment:
No, parquet-mr only applies this to binary statistics:
https://github.com/apache/parquet-mr/pull/696/files#diff-1afc9f89a782ddd4e7cd17546ca048954091627d7a31597ab88892eb2a7a76abR618
Pertaining to the conversation above as well - I could reduce churn by only
allowing the setting of min/max exactness on the constructors for binary-like
stats, by splitting the `statistics_new_func` macro into a
`statistics_new_func_always_exact` and `statistics_new_func_inexact` that
generates a `binary_with_inexact` method? Given that there's only one place in
the code in `column/mod.rs` where we set these to something other than `true`,
would reduce the churn significantly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]