[spark] branch master updated: [SPARK-39749][SQL] Always use plain string representation on casting Decimal to String

gengliang Wed, 13 Jul 2022 00:18:44 -0700

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new c621df269d8 [SPARK-39749][SQL] Always use plain string representation 
on casting Decimal to String
c621df269d8 is described below

commit c621df269d8bc5c9487cab94407a7186253aa590
Author: Gengliang Wang <[email protected]>
AuthorDate: Wed Jul 13 00:18:23 2022 -0700

    [SPARK-39749][SQL] Always use plain string representation on casting 
Decimal to String
    
    ### What changes were proposed in this pull request?
    
    Currently, casting decimal as string type will result in Strings with 
exponential notations if the adjusted exponent is less than -6. This is 
consistent with BigDecimal.toString 
https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html#toString
    
    After this PR, the casting always uses plain string representation.
    
    ### Why are the changes needed?
    
    1. The current behavior doesn't compliant to the ANSI SQL standard.
    <img width="918" alt="image" 
src="https://user-images.githubusercontent.com/1097932/178395756-baecbe90-7a5f-4b4c-b63c-9f1fdf656107.png";>
    <img width="603" alt="image" 
src="https://user-images.githubusercontent.com/1097932/178395567-fa5b6877-ff08-48b5-b715-243c954d6bbc.png";>
    
    2. It is different from databases like PostgreSQL/Oracle/MS SQL server/etc.
    3. The current behavior may surprise users since it only happens when the 
adjusted exponent is less than -6. The following query will return `false` by 
default (when ANSI SQL mode is off) since the `0.0000000123` is converted as 
`1.23E-7`:
    ```sql
    select '0.000000123' in (0.000000123);
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, after changes, Spark SQL always uses plain string representation on 
casting Decimal to String. To restore the legacy behavior, which uses 
scientific notation if the adjusted exponent is less than -6, set 
`spark.sql.legacy.castDecimalToString.enabled` to `true`.
    
    ### How was this patch tested?
    
    Unit test
    
    Closes #37160 from gengliangwang/decimalToString.
    
    Authored-by: Gengliang Wang <[email protected]>
    Signed-off-by: Gengliang Wang <[email protected]>
---
 docs/sql-migration-guide.md                                   |  1 +
 .../org/apache/spark/sql/catalyst/expressions/Cast.scala      |  5 +++++
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala    | 11 +++++++++++
 .../src/main/scala/org/apache/spark/sql/types/Decimal.scala   |  2 ++
 .../apache/spark/sql/catalyst/expressions/CastSuiteBase.scala |  8 ++++++++
 5 files changed, 27 insertions(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index c4e9d1f2a25..e37e12f71eb 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -26,6 +26,7 @@ license: |
   
   - Since Spark 3.4, Number or Number(\*) from Teradata will be treated as 
Decimal(38,18). In Spark 3.3 or earlier, Number or Number(\*) from Teradata 
will be treated as Decimal(38, 0), in which case the fractional part will be 
removed.
   - Since Spark 3.4, v1 database, table, permanent view and function 
identifier will include 'spark_catalog' as the catalog name if database is 
defined, e.g. a table identifier will be: `spark_catalog.default.t`. To restore 
the legacy behavior, set `spark.sql.legacy.v1IdentifierNoCatalog` to `true`.
+  - Since Spark 3.4, the results of casting Decimal values as String type will 
not contain exponential notations. To restore the legacy behavior, which uses 
scientific notation if the adjusted exponent is less than -6, set 
`spark.sql.legacy.castDecimalToString.enabled` to `true`.
 
 ## Upgrading from Spark SQL 3.2 to 3.3
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index 45950607e0d..5dd986e25e7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -512,6 +512,7 @@ case class Cast(
     TimestampFormatter.getFractionFormatter(ZoneOffset.UTC)
 
   private val legacyCastToStr = 
SQLConf.get.getConf(SQLConf.LEGACY_COMPLEX_TYPES_TO_STRING)
+  private val legacyCastDecimalToStr = 
SQLConf.get.getConf(SQLConf.LEGACY_DECIMAL_TO_STRING)
   // The brackets that are used in casting structs and maps to strings
   private val (leftBracket, rightBracket) = if (legacyCastToStr) ("[", "]") 
else ("{", "}")
 
@@ -625,6 +626,8 @@ case class Cast(
     case DayTimeIntervalType(startField, endField) =>
       buildCast[Long](_, i => UTF8String.fromString(
         IntervalUtils.toDayTimeIntervalString(i, ANSI_STYLE, startField, 
endField)))
+    case _: DecimalType if !legacyCastDecimalToStr =>
+      buildCast[Decimal](_, d => UTF8String.fromString(d.toPlainString))
     case _ => buildCast[Any](_, o => UTF8String.fromString(o.toString))
   }
 
@@ -1475,6 +1478,8 @@ case class Cast(
             $evPrim = UTF8String.fromString($iu.toDayTimeIntervalString($c, 
$style,
               (byte)${i.startField}, (byte)${i.endField}));
           """
+      case _: DecimalType if !legacyCastDecimalToStr =>
+        (c, evPrim, _) => code"$evPrim = 
UTF8String.fromString($c.toPlainString());"
       case _ =>
         (c, evPrim, evNull) => code"$evPrim = 
UTF8String.fromString(String.valueOf($c));"
     }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 1b7857ead59..15bd5cb5e36 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -3697,6 +3697,17 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val LEGACY_DECIMAL_TO_STRING =
+    buildConf("spark.sql.legacy.castDecimalToString.enabled")
+      .internal()
+      .doc("When true, casting decimal values as string will use scientific 
notation if an " +
+        "exponent is needed, which is the same with the method 
java.math.BigDecimal.toString(). " +
+        "Otherwise, the casting result won't contain an exponent field, which 
is compliant to " +
+        "the ANSI SQL standard.")
+      .version("3.4.0")
+      .booleanConf
+      .createWithDefault(false)
+
   val LEGACY_PATH_OPTION_BEHAVIOR =
     buildConf("spark.sql.legacy.pathOptionBehavior.enabled")
       .internal()
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
index 759a5dce967..f4f54d2f93b 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
@@ -225,6 +225,8 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
 
   override def toString: String = toBigDecimal.toString()
 
+  def toPlainString: String = toBigDecimal.bigDecimal.toPlainString
+
   def toDebugString: String = {
     if (decimalVal.ne(null)) {
       s"Decimal(expanded, $decimalVal, $precision, $scale)"
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala
index 97cbc781829..da9a7dca9f1 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala
@@ -1305,4 +1305,12 @@ abstract class CastSuiteBase extends SparkFunSuite with 
ExpressionEvalHelper {
         Cast(child, DecimalType.USER_DEFAULT), it)
     }
   }
+
+  test("SPARK-39749: cast Decimal to string") {
+    val input = Literal.create(Decimal(0.000000123), DecimalType(9, 9))
+    checkEvaluation(cast(input, StringType), "0.000000123")
+    withSQLConf(SQLConf.LEGACY_DECIMAL_TO_STRING.key -> "true") {
+      checkEvaluation(cast(input, StringType), "1.23E-7")
+    }
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-39749][SQL] Always use plain string representation on casting Decimal to String

Reply via email to