kosiew opened a new issue, #22061:
URL: https://github.com/apache/datafusion/issues/22061
## Summary
`min_max_scalar` in `datafusion/functions-aggregate-common/src/min_max.rs`
was extracted to make error-path control flow testable, but the core scalar
dispatch is still encoded in the large `min_max_scalar_impl!` macro. This macro
packs type matching, validation, recursion, and return-flow behavior into
non-local expansion paths that are harder to step through, test in isolation,
and evolve safely.
This issue proposes replacing macro-heavy scalar dispatch with a
function-first implementation that preserves current semantics and error
messages while improving testability and maintainability.
## Current State
- Scalar dispatch is implemented in `min_max_scalar_impl!` with many match
arms.
- Dictionary recursion and dictionary-key-type invariants are embedded
inside macro expansion.
- Decimal compatibility checks and some incompatible-type errors are spread
across macro arms.
- `min_max_scalar` currently wraps macro invocation and maps `Ordering` to
`min` or `max` semantics.
## Problem
- Macro expansion hides control flow and increases debugging friction.
- It is difficult to unit test small dispatch decisions without exercising
large macro branches.
- Future behavior changes risk reintroducing non-local control-flow mistakes.
- Compiler errors and review diffs are harder to reason about in a large
macro body.
## Goals
- Move scalar dispatch to regular Rust functions with explicit control flow.
- Preserve all externally visible behavior:
- Same min/max results for all currently supported scalar variants.
- Same dictionary semantics and key-type mismatch invariants.
- Same decimal precision/scale checks.
- Same NaN behavior.
- Same error class and equivalent error text for incompatible inputs.
- Keep the change crate-scoped and reviewable.
## Non-Goals
- No dictionary batch optimization work in this issue.
- No semantic changes to aggregate min/max behavior.
- No broad refactor of batch kernels outside scalar dispatch.
## Proposed Approach
1. Introduce function-based dispatch helpers:
- `fn min_max_scalar(lhs: &ScalarValue, rhs: &ScalarValue, ordering:
Ordering) -> Result<ScalarValue>` remains the entrypoint.
- Add internal helpers to isolate concerns, such as:
- `fn min_max_scalar_same_variant(...) -> Result<ScalarValue>`
- `fn min_max_dictionary_scalar(...) -> Result<ScalarValue>`
- `fn ensure_decimal_compatibility(...) -> Result<()>`
2. Replace macro match arms with explicit `match (lhs, rhs)` in one or more
functions.
3. Keep current compacting and cloning behavior where relevant.
4. Preserve current dictionary rewrap behavior when key types match.
5. Remove `min_max_scalar_impl!` after parity is proven.
## Testing Plan
- Extend unit tests in
`datafusion/functions-aggregate-common/src/min_max.rs` with table-driven parity
checks over representative scalar pairs.
- Keep and validate dictionary-focused tests:
- dictionary vs scalar compare-by-inner-value
- dictionary vs dictionary same key type rewrap
- dictionary key type mismatch error
- dictionary vs incompatible scalar error
- Add targeted tests for decimal precision/scale mismatch and
fixed-size-binary size mismatch.
- Run crate-scoped tests:
- `cargo test -p datafusion-functions-aggregate-common min_max`
## Acceptance Criteria
- `min_max_scalar_impl!` is removed.
- `min_max_scalar` behavior is unchanged for all existing test cases.
- New tests cover control-flow edges previously hidden in macro expansion.
- No regressions in aggregate min/max tests that depend on scalar comparison.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]