andygrove opened a new pull request, #4159:
URL: https://github.com/apache/datafusion-comet/pull/4159
## Which issue does this PR close?
Closes #419.
## Rationale for this change
`base64` is a commonly used Spark string function. The expression coverage
doc previously listed it as unsupported, so queries using it fell back to Spark.
## What changes are included in this PR?
- New native function `spark_base64` in
`native/spark-expr/src/string_funcs/base64.rs` that produces padded RFC 4648
base64 (no line breaks). Wired into `create_comet_physical_fun` as `"base64"`.
- New Scala serdes in
`spark/src/main/scala/org/apache/comet/serde/strings.scala`:
- `CometBase64` for the Spark 3.4 case-class shape (`Base64(child)`).
Always returns `Incompatible` because Spark 3.4 always chunks the output.
- `CometBase64StaticInvoke` for the Spark 3.5+ shape, where `Base64` is
`RuntimeReplaceable` and arrives as `StaticInvoke(classOf[Base64], "encode",
Seq(child, Literal(chunkBase64)))`. Returns `Compatible` only when the literal
`chunkBase64` is `false`; otherwise `Incompatible`.
- `CometStaticInvoke` now delegates `getSupportLevel` and
`getExprConfigName` to its inner handler so the `Base64`-specific support level
and config name (`spark.comet.expr.Base64.allowIncompatible`) take effect
through the StaticInvoke dispatch path.
- Comet SQL Tests:
- `spark/src/test/resources/sql-tests/expressions/string/base64.sql`
covers binary and string columns, literals, NULL, empty input, the SPARK-47307
58-byte chunking boundary, a 200-byte input, and the full 0x00..0xFF byte range.
-
`spark/src/test/resources/sql-tests/expressions/string/base64_chunked_fallback.sql`
asserts that on Spark 3.5+ Comet falls back to Spark when
`spark.sql.chunkBase64String.enabled=true` and incompatible expressions have
not been opted in.
- Coverage doc `docs/source/contributor-guide/spark_expressions_support.md`
updated with audit annotations for Spark 3.4.3 / 3.5.8 / 4.0.1.
This change was scaffolded with the `implement-comet-expression` Claude
skill and the resulting implementation was reviewed with the
`audit-comet-expression` skill.
## How are these changes tested?
- New Comet SQL Tests under
`spark/src/test/resources/sql-tests/expressions/string/` cover both the
compatible (`chunkBase64String.enabled=false`) and the fallback
(`chunkBase64String.enabled=true`) paths.
- New Rust unit tests in `native/spark-expr/src/string_funcs/base64.rs`
cover array, scalar, NULL, and padding cases.
- `make format`, `cargo clippy --all-targets --workspace -- -D warnings`,
and the targeted `CometSqlFileTestSuite` runs all pass locally.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]