parthchandra opened a new pull request, #4009:
URL: https://github.com/apache/datafusion-comet/pull/4009
## Which issue does this PR close?
Part of https://github.com/apache/datafusion-comet/issues/286
Fixes: #1068
## Rationale for this change
Comet had gaps in Decimal-to-String cast coverage:
Decimal -> String (LEGACY mode): cast_array was falling through to
DataFusion's built-in cast for Decimal128 -> Utf8, which produces plain
notation. Spark's LEGACY mode uses Java BigDecimal.toString(), which produces
scientific notation when the adjusted exponent is less than -6. For example,
Decimal(38,18) zero was cast as
"0.000000000000000000" by Comet but "0E-18" by Spark.
Decimal with negative scale: When
spark.sql.legacy.allowNegativeScaleOfDecimal=true, Spark formats negative-scale
decimals in scientific notation (e.g. Decimal(7,-2) unscaled=123 -> "1.23E+4").
Comet now matches this. When the config is disabled, CometCast.isSupported
returns Incompatible since negative-scale decimals cannot be created
via normal SQL paths.
Other primitive types -> String: Boolean, integer, float, double, date,
timestamp, and binary casts to String were already handled correctly by
DataFusion's built-in cast. The existing castTest harness already exercises
LEGACY, TRY, and ANSI modes for all of these types, so no new tests were needed.
## What changes are included in this PR?
- native/spark-expr/src/conversion_funcs/numeric.rs: adds
cast_decimal128_to_utf8 and decimal128_to_java_string that replicate Java
BigDecimal.toString() behavior (scientific notation when adj_exp < -6).
- native/spark-expr/src/conversion_funcs/cast.rs: adds a LEGACY-only match
arm for Decimal128 -> Utf8 that calls cast_decimal128_to_utf8; TRY and ANSI
fall through to DataFusion's plain-notation cast, which already matches Spark.
- spark/src/main/scala/org/apache/comet/expressions/CometCast.scala: marks
all DecimalType -> StringType as Compatible across all eval modes, with a
special Incompatible case for negative-scale decimals when the legacy config is
disabled.
- Tests: adds cast DecimalType(38,18) to StringType and cast DecimalType
with negative scale to StringType tests. Extends
generateDecimalsPrecision38Scale18 with small-magnitude values that trigger the
scientific notation path.
## How are these changes tested?
- Unit tests in CometCastSuite covering Decimal(38,18), negative-scale
decimals (with localTableScan.enabled to avoid Parquet), and the
CometCast.isSupported boundary for negative-scale with config disabled.
- Rust unit test test_decimal128_to_java_string covering all branches of
the formatting logic (plain, leading zeros, scientific notation,
positive/negative exponent, zero, negatives).
- Existing castTest harness already exercises LEGACY, TRY, and ANSI modes
for all other primitive-to-String casts (boolean, byte, short, int, long,
float, double, date, timestamp, binary), confirming compatibility.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]