n0r0shi opened a new pull request, #3571:
URL: https://github.com/apache/datafusion-comet/pull/3571

   
   ## Which issue does this PR close?
   
   Closes #419.
   
   ## Rationale for this change
   
   Spark's `base64()` expression is currently not supported by Comet, causing 
fallback to Spark. This is a commonly used function that can be mapped directly 
to DataFusion's built-in `encode(input, 'base64')` function with no Rust 
changes.
   
   ## What changes are included in this PR?
   
   Two code paths handle different Spark versions:
   
   - **Spark 3.4**: `Base64` is a direct expression node. Added `CometBase64` 
handler in `strings.scala` and registered it in the `stringExpressions` map.
   - **Spark 3.5+**: `Base64` is `RuntimeReplaceable` — Spark's optimizer 
rewrites it into `StaticInvoke(Base64.encode, [input, chunkBase64])` before 
Comet sees the plan. Added `CometBase64Encode` handler in `statics.scala` to 
handle this.
   
   Both paths produce the same DataFusion call: `encode(input, 'base64')`.
   
   **Chunked base64** (`spark.sql.chunkBase64String.enabled=true`, which 
inserts newlines every 76 chars per RFC 2045) is not supported by DataFusion's 
`encode` function, so it falls back to Spark. I can take a look at DataFusion 
side for this later.
   
   ## Are these changes tested?
   
   - Normal base64 encoding: `checkSparkAnswerAndOperator` verifies correct 
results and native Comet execution
   - NULL handling: verified via `checkSparkAnswerAndOperator`
   - Chunked base64 fallback: `checkSparkAnswerAndFallbackReason` verifies 
correct results via Spark fallback and checks the expected fallback reason 
message
   - The Spark 3.4 direct expression handler (`CometBase64`) is exercised when 
CI runs the `spark-3.4` profile. On Spark 3.5+ it is not reached because Spark 
replaces `Base64` with `StaticInvoke` during optimization.
   
   ## Are there any user-facing changes?
   
   Yes. `base64()` is now executed natively by Comet instead of falling back to 
Spark, improving performance for queries using this function.
   
   ```sql
   SELECT base64(binary_column) FROM table
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to