sweb opened a new pull request, #22892:
URL: https://github.com/apache/datafusion/pull/22892
Closes #22687
## Rationale for this change
The distance API in `datafusion/common/src/scalar/mod.rs` previously
returned `Option<usize>`. `usize` is machine-width dependent and does not
represent value-domain cardinality. This could lead to target-dependent
behavior on large integer/temporal ranges. Additionally, downstream callers
like `interval_arithmetic.rs` had to convert the distance back to `u64` to
compute cardinality.
Exposing an overflow-aware `u64`-oriented contract (`distance_u64`) resolves
these architecture differences and aligns the API with value-domain semantics.
## What changes are included in this PR?
- Added `distance_u64`: Added a new public method `distance_u64(&self,
other: &ScalarValue) -> Option<u64>` to `ScalarValue`.
- Deprecated `distance`: Marked the original `distance(&self, other:
&ScalarValue) -> Option<usize>` method as deprecated and redirected it to call
`distance_u64`.
- Interval Cardinality: Migrated the cardinality calculation in
`datafusion/expr-common/src/interval_arithmetic.rs` to use `distance_u64`
directly.
- Selectivity / Stats Overlap: Migrated the overlap calculations in
`datafusion/common/src/stats.rs` to use `distance_u64`.
- Boundary/Overflow Tests: Added `test_scalar_distance_u64_boundaries` in
`scalar/mod.rs` to verify edge cases:
- Full signed range edge (`i64::MIN` to `i64::MAX`)
- Full unsigned range edge (`u64::MIN` to `u64::MAX`)
- Large temporal range edge (`TimestampSecond` and `Date32` boundaries)
- Overflow-to-None behavior (exceeding `u64::MAX` for Float, `Decimal128`,
and `Decimal256` values)
## Are these changes tested?
Yes, they are covered by the new unit tests in `datafusion-common` and
existing test suites in both `datafusion-common` and `datafusion-expr-common`.
## Are there any user-facing changes?
Yes, `ScalarValue::distance` has been deprecated in favor of
`ScalarValue::distance_u64`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]