tustvold commented on code in PR #5076:
URL: https://github.com/apache/arrow-rs/pull/5076#discussion_r1392780160
##########
parquet/src/file/statistics.rs:
##########
@@ -88,13 +90,17 @@ macro_rules! statistics_new_func {
distinct: Option<u64>,
nulls: u64,
is_deprecated: bool,
+ is_max_value_exact: bool,
+ is_min_value_exact: bool,
Review Comment:
Changing this function signature will result in some non-trivial code churn,
what do you think of keeping this function as-is, defaulting the values to
`true` and then adding two methods like
```
pub fn with_is_max_value_exact(self, exact: bool) -> Self {
...
}
pub fn with_is_min_value_exact(self, exact: bool) -> Self {
...
}
```
##########
parquet/src/file/statistics.rs:
##########
@@ -152,6 +158,12 @@ pub fn from_thrift(
stats.max_value
};
+ // Whether or not the min/max values are exact. Due to
pre-existing truncation
+ // in other libraries such as parquet-mr, we can't assume that any
given parquet file
Review Comment:
Does parquet-mr truncate non-binary columns?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]