[ https://issues.apache.org/jira/browse/SPARK-40129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790107#comment-17790107 ]
Thomas Graves commented on SPARK-40129: --------------------------------------- this looks like a dup of https://issues.apache.org/jira/browse/SPARK-45786 > Decimal multiply can produce the wrong answer because it rounds twice > --------------------------------------------------------------------- > > Key: SPARK-40129 > URL: https://issues.apache.org/jira/browse/SPARK-40129 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.0, 3.3.0, 3.4.0 > Reporter: Robert Joseph Evans > Priority: Major > Labels: pull-request-available > > This looks like it has been around for a long time, but I have reproduced it > in 3.2.0+ > The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), > but I think it can be reproduced with other number combinations, and possibly > with divide too. > {code:java} > Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as > DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as > DECIMAL(38,10))").show(truncate=false) > {code} > This produces an answer in Spark of > {{-110083130231976019291714061058.575920}} But if I do the calculation in > regular java BigDecimal I get {{-110083130231976019291714061058.575919}} > {code:java} > BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913"); > BigDecimal r = new BigDecimal("-12.0000000000"); > BigDecimal prod = l.multiply(r); > BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP); > {code} > Spark does essentially all of the same operations, but it used Decimal to do > it instead of java's BigDecimal directly. Spark, by way of Decimal, will set > a MathContext for the multiply operation that has a max precision of 38 and > will do half up rounding. That means that the result of the multiply > operation in Spark is {{{}-110083130231976019291714061058.57591950{}}}, but > for the java BigDecimal code the result is > {{{}-110083130231976019291714061058.57591949560000000000{}}}. Then in > CheckOverflow for 3.2.0 and 3.3.0 or in just the regular Multiply expression > in 3.4.0 the setScale is called (as a part of Decimal.setPrecision). At that > point the already rounded number is rounded yet again resulting in what is > arguably a wrong answer by Spark. > I have not fully tested this, but it looks like we could just remove the > MathContext entirely in Decimal, or set it to UNLIMITED. All of the decimal > operations appear to have their own overflow and rounding anyways. If we want > to potentially reduce the total memory usage, we could also set the max > precision to 39 and truncate (round down) the result in the math context > instead. That would then let us round the result correctly in setPrecision > afterwards. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org