[
https://issues.apache.org/jira/browse/SPARK-40129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722224#comment-17722224
]
Jia Fan commented on SPARK-40129:
---------------------------------
https://github.com/apache/spark/pull/41156
> Decimal multiply can produce the wrong answer because it rounds twice
> ---------------------------------------------------------------------
>
> Key: SPARK-40129
> URL: https://issues.apache.org/jira/browse/SPARK-40129
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.2.0, 3.3.0, 3.4.0
> Reporter: Robert Joseph Evans
> Priority: Major
>
> This looks like it has been around for a long time, but I have reproduced it
> in 3.2.0+
> The example here is multiplying Decimal(38, 10) by another Decimal(38, 10),
> but I think it can be reproduced with other number combinations, and possibly
> with divide too.
> {code:java}
> Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as
> DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as
> DECIMAL(38,10))").show(truncate=false)
> {code}
> This produces an answer in Spark of
> {{-110083130231976019291714061058.575920}} But if I do the calculation in
> regular java BigDecimal I get {{-110083130231976019291714061058.575919}}
> {code:java}
> BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913");
> BigDecimal r = new BigDecimal("-12.0000000000");
> BigDecimal prod = l.multiply(r);
> BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP);
> {code}
> Spark does essentially all of the same operations, but it used Decimal to do
> it instead of java's BigDecimal directly. Spark, by way of Decimal, will set
> a MathContext for the multiply operation that has a max precision of 38 and
> will do half up rounding. That means that the result of the multiply
> operation in Spark is {{{}-110083130231976019291714061058.57591950{}}}, but
> for the java BigDecimal code the result is
> {{{}-110083130231976019291714061058.57591949560000000000{}}}. Then in
> CheckOverflow for 3.2.0 and 3.3.0 or in just the regular Multiply expression
> in 3.4.0 the setScale is called (as a part of Decimal.setPrecision). At that
> point the already rounded number is rounded yet again resulting in what is
> arguably a wrong answer by Spark.
> I have not fully tested this, but it looks like we could just remove the
> MathContext entirely in Decimal, or set it to UNLIMITED. All of the decimal
> operations appear to have their own overflow and rounding anyways. If we want
> to potentially reduce the total memory usage, we could also set the max
> precision to 39 and truncate (round down) the result in the math context
> instead. That would then let us round the result correctly in setPrecision
> afterwards.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]