malinjawi opened a new pull request, #11437:
URL: https://github.com/apache/incubator-gluten/pull/11437

   
   <!--
   Thank you for submitting a pull request! Here are some tips:
   
   1. For first-time contributors, please read our contributing guide:
      https://github.com/apache/incubator-gluten/blob/main/CONTRIBUTING.md
   2. If necessary, create a GitHub issue for discussion beforehand to avoid 
duplicate work.
   3. If the PR is specific to a single backend, include [VL] or [CH] in the PR 
title to indicate the
      Velox or ClickHouse backend, respectively.
   4. If the PR is not ready for review, please mark it as a draft.
   -->
   
   ## What changes are proposed in this pull request?
   
   <!--
   Provide a clear and concise description of the changes introduced in this PR.
   Ensure the PR description aligns with the code changes, especially after 
updates.
   If applicable, include "Fixes #<GitHub_Issue_ID>" to automatically close the 
corresponding issue
   when the PR is merged.
   -->
   This PR implements ANSI-compliant string to boolean casting for the Velox 
backend, addressing part of issue #10134 (ANSI mode support).
   
   **Key Changes:**
   1. **C++ Implementation**: Added `CastStringToBooleanAnsi.h` with a custom 
Velox function that implements Spark's ANSI cast semantics for 
string-to-boolean conversion
   2. **Function Registration**: Registered `spark_cast_string_to_boolean_ansi` 
function in Velox's function registry
   3. **Scala Integration**: Updated `CastTransformer` to detect ANSI mode and 
route string-to-boolean casts to the custom function
   4. **Literal Optimization**: Added compile-time evaluation for literal casts 
to improve performance
   5. **Test Coverage**: Added comprehensive test suites for both ANSI and 
non-ANSI modes
   
   **Behavior:**
   - In ANSI mode, accepts case-insensitive: `t`, `true`, `y`, `yes`, `1` 
(true) and `f`, `false`, `n`, `no`, `0` (false)
   - Invalid inputs throw `VELOX_USER_FAIL` exception with descriptive error 
message
   - Whitespace is trimmed before validation
   - Matches Spark's ANSI cast behavior exactly
   
   Fixes #10134 (partial - cast string to boolean component)
   
   ## How was this patch tested?
   
   <!--
   Describe how the changes were tested, if applicable.
   Include new tests to validate the functionality, if necessary.
   For UI-related changes, attach screenshots to demonstrate the updates.
   -->
   **Test Coverage:**
   1. **ANSI Mode Tests** (`CastStringToBooleanAnsiValidateSuite.scala`):
      - Valid true/false string variations (case-insensitive)
      - Invalid strings that should throw exceptions
      - Null handling
      - Mixed valid/invalid values
      - WHERE clause filtering
      - Whitespace handling
   
   2. **Non-ANSI Mode Tests** (`CastStringToBooleanValidateSuite.scala`):
      - Valid string conversions
      - Invalid strings returning null (non-ANSI behavior)
      - Mixed valid/invalid/null values
      - Empty and whitespace strings
      - All valid boolean string variations
   
   **Validation:**
   - All tests compare Gluten results against vanilla Spark to ensure 
behavioral parity
   - Tests verify that the custom ANSI function is used in the execution plan
   - Tests confirm proper fallback behavior when ANSI mode is disabled
   - Error handling matches Spark's exception messages and behavior
   
   **Manual Testing:**
   - Tested with `spark.sql.ansi.enabled=true` and `false`
   - Verified execution plans use `spark_cast_string_to_boolean_ansi` in ANSI 
mode
   - Confirmed standard Velox cast is used in non-ANSI mode
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to