hcrosse opened a new pull request, #21689:
URL: https://github.com/apache/datafusion/pull/21689

   ## Which issue does this PR close?
   
   - Closes #18020
   
   ## Rationale for this change
   
   `concat(array, array, ...)` in SQL dispatches to the string `concat` UDF, 
which only handles string and binary types. With array arguments it falls 
through to a debug-only path and returns wrong results.
   
   Two earlier attempts were rejected. 
[#18137](https://github.com/apache/datafusion/pull/18137) changed 
`ConcatFunc`'s signature to accept arrays and broke `simplify_expressions`. 
[#18105](https://github.com/apache/datafusion/pull/18105) duplicated 
`array_concat` logic inside `ConcatFunc`.
   
   This PR rewrites `concat(array, ...)` to `array_concat(array, ...)` at the 
analyzer phase. Every logical plan gets the corrected behavior regardless of 
frontend, the string `concat` signature stays untouched, and no array logic is 
duplicated.
   
   ## What changes are included in this PR?
   
   A new `ConcatArrayRewrite` `FunctionRewrite` lives in 
`datafusion-functions-nested`. It detects calls to `ConcatFunc` by identity 
check via `Any::is::<ConcatFunc>`, so user-level shadowing of `concat` such as 
Spark's variant is unaffected. When all args resolve to `List`, `LargeList`, or 
`FixedSizeList` it rewrites to `array_concat_udf()`. Mixed array and non-array 
returns a `plan_err`.
   
   The rewrite is wired into 
`SessionStateDefaults::default_function_rewrites()` and registered on the 
analyzer in `SessionStateBuilder::with_default_features()`, which is the actual 
default-init path. It's also registered via 
`FunctionRegistry::register_function_rewrite` in 
`functions_nested::register_all` as a fallback for custom registries.
   
   ## Are these changes tested?
   
   New SLT coverage in `array/array_concat.slt`:
   
   - 2- and 3-argument array concat, table-column concat, arrays with NULLs, 
string arrays
   - `LargeList` and `FixedSizeList` inputs
   - `NULL::integer[]` at either position, all-NULL case
   - Two error cases for mixed array and non-array
   
   `explain.slt` is updated to reflect the new `apply_function_rewrites` line 
that now appears in `EXPLAIN VERBOSE` output.
   
   ## Are there any user-facing changes?
   
   `concat(array, ...)` now returns correct `array_concat` results instead of 
the prior wrong output. No public API changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to