[jira] [Commented] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

Bruce Robbins (Jira) Thu, 22 Feb 2024 11:48:49 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819789#comment-17819789
 ]


Bruce Robbins commented on SPARK-47134:
---------------------------------------

Oddly, I cannot reproduce on either 3.4.1 or 3.5.0.

Also, my 3.4.1 plan doesn't look like your 3.4.1 plan: My plan uses {{sum}}, 
your plan uses {{decimalsum}}. I can't find where {{decimalsum}} comes from in 
the code base, but maybe I am not looking hard enough.
{noformat}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")

scala> spark.sql("select CAST(SUM(1.00000000000000) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+--------------------+
|                  ct|
+--------------------+
| 9508.00000000000000|
|13879.00000000000000|
+--------------------+

scala> spark.sql("select CAST(SUM(1.00000000000000) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").explain
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Sort [ct#19 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(ct#19 ASC NULLS FIRST, 200), 
ENSURE_REQUIREMENTS, [plan_id=68]
      +- HashAggregate(keys=[_1#2], functions=[sum(1.00000000000000)])
         +- Exchange hashpartitioning(_1#2, 200), ENSURE_REQUIREMENTS, 
[plan_id=65]
            +- HashAggregate(keys=[_1#2], 
functions=[partial_sum(1.00000000000000)])
               +- LocalTableScan [_1#2]

scala> sql("select version()").show(false)
+----------------------------------------------+
|version()                                     |
+----------------------------------------------+
|3.4.1 6b1ff22dde1ead51cbf370be6e48a802daae58b6|
+----------------------------------------------+

scala> 
{noformat}

> Unexpected nulls when casting decimal values in specific cases
> --------------------------------------------------------------
>
>                 Key: SPARK-47134
>                 URL: https://issues.apache.org/jira/browse/SPARK-47134
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.1, 3.5.0
>            Reporter: Dylan Walker
>            Priority: Major
>         Attachments: 321queryplan.txt, 341queryplan.txt
>
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00000000000000) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +--------------------+
> |                  ct|
> +--------------------+
> | 9508.00000000000000|
> |13879.00000000000000|
> +--------------------+
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00000000000000) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +-------------------+
> |                 ct|
> +-------------------+
> |               null|
> |9508.00000000000000|
> +-------------------+
> {code}
> This is fairly delicate:
>  - removing the {{ORDER BY}} clause produces the correct result
>  - removing the {{CAST}} produces the correct result
>  - changing the number of 0s in the argument to {{SUM}} produces the correct 
> result
>  - setting {{spark.ansi.enabled}} to {{true}} produces the correct result 
> (and does not throw an error)
> Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
> result in the unexpected nulls.
> Please let me know if you need additional information.
> We are also interested in understanding whether setting 
> {{spark.ansi.enabled}} can be considered a reliable workaround to this issue 
> prior to a fix being released, if possible.
> Text files that include {{explain()}} output attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

Reply via email to