[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819789#comment-17819789 ]
Bruce Robbins commented on SPARK-47134: --------------------------------------- Oddly, I cannot reproduce on either 3.4.1 or 3.5.0. Also, my 3.4.1 plan doesn't look like your 3.4.1 plan: My plan uses {{sum}}, your plan uses {{decimalsum}}. I can't find where {{decimalsum}} comes from in the code base, but maybe I am not looking hard enough. {noformat} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") scala> spark.sql("select CAST(SUM(1.00000000000000) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +--------------------+ | ct| +--------------------+ | 9508.00000000000000| |13879.00000000000000| +--------------------+ scala> spark.sql("select CAST(SUM(1.00000000000000) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").explain == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- Sort [ct#19 ASC NULLS FIRST], true, 0 +- Exchange rangepartitioning(ct#19 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=68] +- HashAggregate(keys=[_1#2], functions=[sum(1.00000000000000)]) +- Exchange hashpartitioning(_1#2, 200), ENSURE_REQUIREMENTS, [plan_id=65] +- HashAggregate(keys=[_1#2], functions=[partial_sum(1.00000000000000)]) +- LocalTableScan [_1#2] scala> sql("select version()").show(false) +----------------------------------------------+ |version() | +----------------------------------------------+ |3.4.1 6b1ff22dde1ead51cbf370be6e48a802daae58b6| +----------------------------------------------+ scala> {noformat} > Unexpected nulls when casting decimal values in specific cases > -------------------------------------------------------------- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.1, 3.5.0 > Reporter: Dylan Walker > Priority: Major > Attachments: 321queryplan.txt, 341queryplan.txt > > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00000000000000) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +--------------------+ > | ct| > +--------------------+ > | 9508.00000000000000| > |13879.00000000000000| > +--------------------+ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00000000000000) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +-------------------+ > | ct| > +-------------------+ > | null| > |9508.00000000000000| > +-------------------+ > {code} > This is fairly delicate: > - removing the {{ORDER BY}} clause produces the correct result > - removing the {{CAST}} produces the correct result > - changing the number of 0s in the argument to {{SUM}} produces the correct > result > - setting {{spark.ansi.enabled}} to {{true}} produces the correct result > (and does not throw an error) > Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also > result in the unexpected nulls. > Please let me know if you need additional information. > We are also interested in understanding whether setting > {{spark.ansi.enabled}} can be considered a reliable workaround to this issue > prior to a fix being released, if possible. > Text files that include {{explain()}} output attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org