andygrove opened a new pull request, #4159:
URL: https://github.com/apache/datafusion-comet/pull/4159

   ## Which issue does this PR close?
   
   Closes #419.
   
   ## Rationale for this change
   
   `base64` is a commonly used Spark string function. The expression coverage 
doc previously listed it as unsupported, so queries using it fell back to Spark.
   
   ## What changes are included in this PR?
   
   - New native function `spark_base64` in 
`native/spark-expr/src/string_funcs/base64.rs` that produces padded RFC 4648 
base64 (no line breaks). Wired into `create_comet_physical_fun` as `"base64"`.
   - New Scala serdes in 
`spark/src/main/scala/org/apache/comet/serde/strings.scala`:
     - `CometBase64` for the Spark 3.4 case-class shape (`Base64(child)`). 
Always returns `Incompatible` because Spark 3.4 always chunks the output.
     - `CometBase64StaticInvoke` for the Spark 3.5+ shape, where `Base64` is 
`RuntimeReplaceable` and arrives as `StaticInvoke(classOf[Base64], "encode", 
Seq(child, Literal(chunkBase64)))`. Returns `Compatible` only when the literal 
`chunkBase64` is `false`; otherwise `Incompatible`.
   - `CometStaticInvoke` now delegates `getSupportLevel` and 
`getExprConfigName` to its inner handler so the `Base64`-specific support level 
and config name (`spark.comet.expr.Base64.allowIncompatible`) take effect 
through the StaticInvoke dispatch path.
   - Comet SQL Tests:
     - `spark/src/test/resources/sql-tests/expressions/string/base64.sql` 
covers binary and string columns, literals, NULL, empty input, the SPARK-47307 
58-byte chunking boundary, a 200-byte input, and the full 0x00..0xFF byte range.
     - 
`spark/src/test/resources/sql-tests/expressions/string/base64_chunked_fallback.sql`
 asserts that on Spark 3.5+ Comet falls back to Spark when 
`spark.sql.chunkBase64String.enabled=true` and incompatible expressions have 
not been opted in.
   - Coverage doc `docs/source/contributor-guide/spark_expressions_support.md` 
updated with audit annotations for Spark 3.4.3 / 3.5.8 / 4.0.1.
   
   This change was scaffolded with the `implement-comet-expression` Claude 
skill and the resulting implementation was reviewed with the 
`audit-comet-expression` skill.
   
   ## How are these changes tested?
   
   - New Comet SQL Tests under 
`spark/src/test/resources/sql-tests/expressions/string/` cover both the 
compatible (`chunkBase64String.enabled=false`) and the fallback 
(`chunkBase64String.enabled=true`) paths.
   - New Rust unit tests in `native/spark-expr/src/string_funcs/base64.rs` 
cover array, scalar, NULL, and padding cases.
   - `make format`, `cargo clippy --all-targets --workspace -- -D warnings`, 
and the targeted `CometSqlFileTestSuite` runs all pass locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to