theirix opened a new pull request, #22244: URL: https://github.com/apache/datafusion/pull/22244
## Which issue does this PR close? - Closes #12709. ## Rationale for this change While #21883 introduced binary argument support for the pipe operator, this PR targets three UDFs: `concat`, `concat_ws`, and Spark's `concat` to harmonise all their behaviour. We decided to support only string+string and binary+binary and ban mixed operations, to match [the behaviour of the majority of engines](https://github.com/apache/datafusion/issues/12709#issuecomment-4194430343). Previously, mixed behaviour was allowed in #20787. ## What changes are included in this PR? - Added support for binary types (four variants) - Disallowed mixed string/binary operations - Fixed edge cases when binary->string type coercion overrode UDF rules, so mixed calls were allowed - Refactored the three UDFs by extracting duplicate code into shared helpers to keep the logic centralised - Refined `concat_ws` behavior for different separator types The diff is quite large. Detailed code changes: - Added a trait `ConcatBuilder` to abstract string/binary/view/array operations - Abstracted `StringArray`/`LargeStringArray` into a generic `ConcatGenericStringBuilder` - less duplication - Introduced mirrored builders for binary types in a new file `binaries.rs` - Introduced more `ColumnarValueRef` variants to handle nullable and non-nullable binary types - Extracted the ColumnarValue -> ColumnarValueRef builder to `from_columnar_value` - simplified call sites. Scalar code path stays mostly the same - Switched from `Signature::variadic` to `Signature::UserDefined` to allow different argument types.`Variadic` required every argument (including binaries) to be coerced to the same string type, so the UDF cannot distinguish between binary and string inputs. It's a relatively uncommon - happy to discuss - Simplified Spark's concat significantly by reusing the `concat` implementation, so it handles only Spark-specific null-handling - Moved SLTs from Spark SLT (`spark/concat.slt`) to a generic SLT, so we test both generic and Spark-specific behaviour ## Are these changes tested? - Added more unit tests for previously uncovered major code paths - Added more SLTs, especially for type coercion ## Are there any user-facing changes? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
