parthchandra opened a new pull request, #4009:
URL: https://github.com/apache/datafusion-comet/pull/4009

   ## Which issue does this PR close?
   
     Part of https://github.com/apache/datafusion-comet/issues/286
     Fixes: #1068
   
    ## Rationale for this change
   
     Comet had gaps in Decimal-to-String cast coverage:
   
     Decimal -> String (LEGACY mode): cast_array was falling through to 
DataFusion's built-in cast for Decimal128 -> Utf8, which produces plain 
notation. Spark's LEGACY mode uses Java BigDecimal.toString(), which produces 
scientific notation when the adjusted exponent is less than -6. For example, 
Decimal(38,18) zero was cast as
     "0.000000000000000000" by Comet but "0E-18" by Spark.
   
     Decimal with negative scale: When 
spark.sql.legacy.allowNegativeScaleOfDecimal=true, Spark formats negative-scale 
decimals in scientific notation (e.g. Decimal(7,-2) unscaled=123 -> "1.23E+4"). 
Comet now matches this. When the config is disabled, CometCast.isSupported 
returns Incompatible since negative-scale decimals cannot be created
      via normal SQL paths.
   
     Other primitive types -> String: Boolean, integer, float, double, date, 
timestamp, and binary casts to String were already handled correctly by 
DataFusion's built-in cast. The existing castTest harness already exercises 
LEGACY, TRY, and ANSI modes for all of these types, so no new tests were needed.
   
    ## What changes are included in this PR?
   
     - native/spark-expr/src/conversion_funcs/numeric.rs: adds 
cast_decimal128_to_utf8 and decimal128_to_java_string that replicate Java 
BigDecimal.toString() behavior (scientific notation when adj_exp < -6).
     - native/spark-expr/src/conversion_funcs/cast.rs: adds a LEGACY-only match 
arm for Decimal128 -> Utf8 that calls cast_decimal128_to_utf8; TRY and ANSI 
fall through to DataFusion's plain-notation cast, which already matches Spark.
     - spark/src/main/scala/org/apache/comet/expressions/CometCast.scala: marks 
all DecimalType -> StringType as Compatible across all eval modes, with a 
special Incompatible case for negative-scale decimals when the legacy config is 
disabled.
     - Tests: adds cast DecimalType(38,18) to StringType and cast DecimalType 
with negative scale to StringType tests. Extends 
generateDecimalsPrecision38Scale18 with small-magnitude values that trigger the 
scientific notation path.
   
   ##  How are these changes tested?
   
     - Unit tests in CometCastSuite covering Decimal(38,18), negative-scale 
decimals (with localTableScan.enabled to avoid Parquet), and the 
CometCast.isSupported boundary for negative-scale with config disabled.
     - Rust unit test test_decimal128_to_java_string covering all branches of 
the formatting logic (plain, leading zeros, scientific notation, 
positive/negative exponent, zero, negatives).
     - Existing castTest harness already exercises LEGACY, TRY, and ANSI modes 
for all other primitive-to-String casts (boolean, byte, short, int, long, 
float, double, date, timestamp, binary), confirming compatibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to