[PR] feat: opt concat into codegen dispatch for non-UTF8_BINARY collations [datafusion-comet]

via GitHub Fri, 12 Jun 2026 06:43:19 -0700


andygrove opened a new pull request, #4640:
URL: https://github.com/apache/datafusion-comet/pull/4640


   ## Which issue does this PR close?
   
   Part of #4596 (the `concat` candidate, the last one on the list).
   
   ## Rationale for this change
   
   `CometConcat` reports `Incompatible` when any child uses a non-default 
(non-UTF8_BINARY) collation: Spark 4.0+ widens `concat` to accept collated 
strings and preserves the collation in the result type, but the native concat 
UDF always produces UTF8_BINARY and loses it. With `allowIncompatible` unset 
that falls the whole projection back to Spark. `Concat` has a real Spark 
`doGenCode` and string input/output types, so it is eligible for the 
`CodegenDispatchFallback` path: route the `Incompatible` collated case through 
the JVM codegen dispatcher (Spark's own `doGenCode` inside the Comet pipeline) 
so it stays native and matches Spark.
   
   ## What changes are included in this PR?
   
   * `CometConcat` mixes in `CodegenDispatchFallback`. The `Unsupported` 
non-string-input case (binary/array children) is unchanged and still falls 
back, and default-collation concat is unaffected (still `Compatible`, native).
   
   ## How are these changes tested?
   
   The existing `string/collation.sql` (Spark 4.0+) already asserted 
`expect_fallback(concat does not support non-UTF8_BINARY collations)` for 
collated `concat`. Those two assertions are replaced with `query`, so they now 
assert native execution matching Spark for both a `UTF8_LCASE` and a 
`UNICODE_CI` collated `concat`. Run with `CometSqlFileTestSuite` and passing.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat: opt concat into codegen dispatch for non-UTF8_BINARY collations [datafusion-comet]

Reply via email to