JeelRajodiya opened a new pull request, #21331: URL: https://github.com/apache/datafusion/pull/21331
**Rationale** The `datafusion-spark` crate is missing the `encode` function. Spark's [`encode(expr, charset)`](https://spark.apache.org/docs/latest/api/sql/index.html#encode) converts a string or binary value into binary using a specified character encoding — this is commonly used in Spark SQL workloads and needed by engines built on DataFusion that target Spark compatibility. **What changes are included in this PR?** Adds `SparkEncode` to `datafusion-spark`'s string functions. It supports **US-ASCII, ISO-8859-1, UTF-8, UTF-16, UTF-16BE, and UTF-16LE** charsets. Binary input is handled via lossy UTF-8 conversion (invalid bytes → U+FFFD), matching Spark/Databricks behavior. **Are these changes tested?** Yes — 15 unit tests covering all charsets, case-insensitive charset matching, null handling, binary input with lossy UTF-8, Utf8View columns, unsupported charset errors, and return field nullability. **Are there any user-facing changes?** New `encode` scalar function available when using `datafusion-spark`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
