malinjawi opened a new pull request, #11437:
URL: https://github.com/apache/incubator-gluten/pull/11437
<!--
Thank you for submitting a pull request! Here are some tips:
1. For first-time contributors, please read our contributing guide:
https://github.com/apache/incubator-gluten/blob/main/CONTRIBUTING.md
2. If necessary, create a GitHub issue for discussion beforehand to avoid
duplicate work.
3. If the PR is specific to a single backend, include [VL] or [CH] in the PR
title to indicate the
Velox or ClickHouse backend, respectively.
4. If the PR is not ready for review, please mark it as a draft.
-->
## What changes are proposed in this pull request?
<!--
Provide a clear and concise description of the changes introduced in this PR.
Ensure the PR description aligns with the code changes, especially after
updates.
If applicable, include "Fixes #<GitHub_Issue_ID>" to automatically close the
corresponding issue
when the PR is merged.
-->
This PR implements ANSI-compliant string to boolean casting for the Velox
backend, addressing part of issue #10134 (ANSI mode support).
**Key Changes:**
1. **C++ Implementation**: Added `CastStringToBooleanAnsi.h` with a custom
Velox function that implements Spark's ANSI cast semantics for
string-to-boolean conversion
2. **Function Registration**: Registered `spark_cast_string_to_boolean_ansi`
function in Velox's function registry
3. **Scala Integration**: Updated `CastTransformer` to detect ANSI mode and
route string-to-boolean casts to the custom function
4. **Literal Optimization**: Added compile-time evaluation for literal casts
to improve performance
5. **Test Coverage**: Added comprehensive test suites for both ANSI and
non-ANSI modes
**Behavior:**
- In ANSI mode, accepts case-insensitive: `t`, `true`, `y`, `yes`, `1`
(true) and `f`, `false`, `n`, `no`, `0` (false)
- Invalid inputs throw `VELOX_USER_FAIL` exception with descriptive error
message
- Whitespace is trimmed before validation
- Matches Spark's ANSI cast behavior exactly
Fixes #10134 (partial - cast string to boolean component)
## How was this patch tested?
<!--
Describe how the changes were tested, if applicable.
Include new tests to validate the functionality, if necessary.
For UI-related changes, attach screenshots to demonstrate the updates.
-->
**Test Coverage:**
1. **ANSI Mode Tests** (`CastStringToBooleanAnsiValidateSuite.scala`):
- Valid true/false string variations (case-insensitive)
- Invalid strings that should throw exceptions
- Null handling
- Mixed valid/invalid values
- WHERE clause filtering
- Whitespace handling
2. **Non-ANSI Mode Tests** (`CastStringToBooleanValidateSuite.scala`):
- Valid string conversions
- Invalid strings returning null (non-ANSI behavior)
- Mixed valid/invalid/null values
- Empty and whitespace strings
- All valid boolean string variations
**Validation:**
- All tests compare Gluten results against vanilla Spark to ensure
behavioral parity
- Tests verify that the custom ANSI function is used in the execution plan
- Tests confirm proper fallback behavior when ANSI mode is disabled
- Error handling matches Spark's exception messages and behavior
**Manual Testing:**
- Tested with `spark.sql.ansi.enabled=true` and `false`
- Verified execution plans use `spark_cast_string_to_boolean_ansi` in ANSI
mode
- Confirmed standard Velox cast is used in non-ANSI mode
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]