dangdat1111 opened a new pull request, #23050:
URL: https://github.com/apache/datafusion/pull/23050

   ## Which issue does this PR close?
   
   - Closes #23041.
   
   ## Rationale for this change
   
   Comparison operators already coerce a string operand to the other operand's 
numeric type (via `string_numeric_coercion`), so `col < '5'` works numerically. 
Arithmetic operators did not, so `1 + '1'` failed during planning with:
   
   ```
   Error during planning: Cannot coerce arithmetic expression Int64 + Utf8 to 
valid types
   ```
   
   This change makes arithmetic consistent with comparison, and aligns the 
engine with its own documented non-ANSI mode behavior, which states that 
implicit casts between types are allowed (e.g. string to integer when possible).
   
   ## What changes are included in this PR?
   
   - In `BinaryTypeCoercer::signature_inner`, add a `string_numeric_coercion` 
fallback to the `Plus | Minus | Multiply | Divide | Modulo` branch. The string 
operand is coerced to the numeric type of the other operand (e.g. `Int64 + 
Utf8` -> both `Int64`, `Utf8 + Float64` -> both `Float64`). The type coercion 
analyzer then inserts the cast on the string operand.
   - Scope is intentionally limited:
     - `string + string` (e.g. `'1' + '2'`) still errors because the target 
type is ambiguous (matches PostgreSQL).
     - Temporal/string pairs (e.g. `Timestamp + Utf8`, `Interval + Utf8`) are 
unaffected, since those types are not numeric and so `string_numeric_coercion` 
does not apply.
   
   ## Are these changes tested?
   
   Yes.
   - Unit tests in 
`datafusion/expr-common/src/type_coercion/binary/tests/arithmetic.rs`: new 
`test_type_coercion_arithmetic_string_numeric` (all 5 operators across 
`Utf8`/`LargeUtf8`/`Utf8View`, both operand orders, result-type checks, and the 
`Utf8 + Utf8` error case). `test_coercion_error` was updated to use `Boolean + 
Boolean`, since `Float32 + Utf8` is now valid.
   - sqllogictests in 
`datafusion/sqllogictest/test_files/string_numeric_coercion.slt`: a new 
arithmetic section covering `1 + '1'`, `'1' + 1`, all 5 operators over an 
integer and a float column, `arrow_typeof` of the results, an `EXPLAIN` showing 
the string literal is cast to the numeric type, a runtime cast error for a 
non-numeric string, and the plan-time error for `'1' + '2'`.
   
   ## Are there any user-facing changes?
   
   Yes. Arithmetic expressions that mix a numeric and a string operand now plan 
and execute (the string is cast to the numeric type) instead of failing during 
planning. There are no breaking changes to public APIs.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to