comphead opened a new issue, #22082:
URL: https://github.com/apache/datafusion/issues/22082
### Is your feature request related to a problem or challenge?
## Is your feature request related to a problem or challenge?
DataFusion's `lag` and `lead` window functions only accept a **scalar**
value as the third (default) argument. The implementation in
`datafusion/functions-window/src/lead_lag.rs` extracts the default
at planning time via `get_scalar_value_from_args(input_exprs, 2)` and
stores it as a `ScalarValue` on the `WindowUDFFieldArgs` / partition evaluator
state.
This means queries like:
```sql
SELECT
b,
LAG(a, 1, c) OVER (ORDER BY b) AS lg,
LEAD(a, 1, c) OVER (ORDER BY b) AS ld
FROM t
```
…where the default expression is a column reference (or any non-literal
expression), cannot be planned natively in DataFusion. Spark accepts an
arbitrary Expression here — the value is meant to be
evaluated per row when the offset row does not exist — and downstream
projects building Spark compatibility on DataFusion (e.g. Apache DataFusion
Comet) currently have to fall back to Spark for this
pattern.
Allow the third argument of lag / lead to be any expression, evaluated per
row in the partition evaluator:
1. In parse_default_value / partition-evaluator construction, accept an
arbitrary PhysicalExpr for the default rather than coercing it to a
ScalarValue.
2. In evaluate_all / evaluate_all_with_ignore_null (and the
shift_with_default_value path), when the offset row does not exist, take the
default from the per-row evaluation of the default-expression
column instead of cloning a single ScalarValue.
3. Preserve the fast path: when the default expression is a literal (the
common case today), continue to materialize it once as a scalar to avoid
overhead.
4. Update the field/return-type derivation in field so the
default-expression's data type still drives result-type unification (the
existing NULL_FIELD fallback already covers the Literal(NULL)
default).
### Describe the solution you'd like
_No response_
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]