andygrove opened a new pull request, #4640: URL: https://github.com/apache/datafusion-comet/pull/4640
## Which issue does this PR close? Part of #4596 (the `concat` candidate, the last one on the list). ## Rationale for this change `CometConcat` reports `Incompatible` when any child uses a non-default (non-UTF8_BINARY) collation: Spark 4.0+ widens `concat` to accept collated strings and preserves the collation in the result type, but the native concat UDF always produces UTF8_BINARY and loses it. With `allowIncompatible` unset that falls the whole projection back to Spark. `Concat` has a real Spark `doGenCode` and string input/output types, so it is eligible for the `CodegenDispatchFallback` path: route the `Incompatible` collated case through the JVM codegen dispatcher (Spark's own `doGenCode` inside the Comet pipeline) so it stays native and matches Spark. ## What changes are included in this PR? * `CometConcat` mixes in `CodegenDispatchFallback`. The `Unsupported` non-string-input case (binary/array children) is unchanged and still falls back, and default-collation concat is unaffected (still `Compatible`, native). ## How are these changes tested? The existing `string/collation.sql` (Spark 4.0+) already asserted `expect_fallback(concat does not support non-UTF8_BINARY collations)` for collated `concat`. Those two assertions are replaced with `query`, so they now assert native execution matching Spark for both a `UTF8_LCASE` and a `UNICODE_CI` collated `concat`. Run with `CometSqlFileTestSuite` and passing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
