berkaysynnada commented on code in PR #8172:
URL: https://github.com/apache/arrow-datafusion/pull/8172#discussion_r1393733385
##########
datafusion/core/src/datasource/statistics.rs:
##########
@@ -211,49 +199,3 @@ pub(crate) fn get_col_stats(
})
.collect()
}
-
-/// If the given value is numerically greater than the original maximum value,
-/// return the new maximum value with appropriate exactness information.
-fn set_max_if_greater(
- max_nominee: Precision<ScalarValue>,
- max_values: Precision<ScalarValue>,
-) -> Precision<ScalarValue> {
- match (&max_values, &max_nominee) {
- (Precision::Exact(val1), Precision::Exact(val2)) if val1 < val2 =>
max_nominee,
- (Precision::Exact(val1), Precision::Inexact(val2))
- | (Precision::Inexact(val1), Precision::Inexact(val2))
- | (Precision::Inexact(val1), Precision::Exact(val2))
- if val1 < val2 =>
- {
- max_nominee.to_inexact()
- }
- (Precision::Exact(_), Precision::Absent) => max_values.to_inexact(),
- (Precision::Absent, Precision::Exact(_)) => max_nominee.to_inexact(),
- (Precision::Absent, Precision::Inexact(_)) => max_nominee,
- (Precision::Absent, Precision::Absent) => Precision::Absent,
- _ => max_values,
- }
-}
-
-/// If the given value is numerically lesser than the original minimum value,
Review Comment:
Same here
##########
datafusion/core/src/datasource/statistics.rs:
##########
@@ -160,17 +159,6 @@ pub(crate) fn create_max_min_accs(
(max_values, min_values)
}
-fn add_row_stats(
Review Comment:
I think they don't share the same behavior, but it didn't show up due to
lack of testing. The `add()` function operates in the safest way; if one of the
operands is absent, the result will also be absent. On the other hand, the
remove() function keeps the non-absent value by changing its exactness (absent
+ exact(value) => inexact(value)).
##########
datafusion/core/src/datasource/statistics.rs:
##########
@@ -211,49 +199,3 @@ pub(crate) fn get_col_stats(
})
.collect()
}
-
-/// If the given value is numerically greater than the original maximum value,
-/// return the new maximum value with appropriate exactness information.
-fn set_max_if_greater(
Review Comment:
There is a similar difference here as well. Max values are conserved by
relaxing the exactness when an absent statistic is read from the file.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]