Robert Joseph Evans created SPARK-40129: -------------------------------------------
Summary: Decimal multiply can produce the wrong answer because it rounds twice Key: SPARK-40129 URL: https://issues.apache.org/jira/browse/SPARK-40129 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0, 3.2.0, 3.4.0 Reporter: Robert Joseph Evans This looks like it has been around for a long time, but I have reproduced it in 3.2.0+ The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), but I think it can be reproduced with other number combinations, and possibly with divide too. {code:java} Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as DECIMAL(38,10))").show(truncate=false) {code} This produces an answer in Spark of {{-110083130231976019291714061058.575920}} But if I do the calculation in regular java BigDecimal I get {{-110083130231976019291714061058.575919}} {code:java} BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913"); BigDecimal r = new BigDecimal("-12.0000000000"); BigDecimal prod = l.multiply(r); BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP); {code} Spark does essentially all of the same operations, but it used Decimal to do it instead of java's BigDecimal directly. Spark, by way of Decimal, will set a MathContext for the multiply operation that has a max precision of 38 and will do half up rounding. That means that the result of the multiply operation in Spark is {{{}-110083130231976019291714061058.57591950{}}}, but for the java BigDecimal code the result is {{{}-110083130231976019291714061058.57591949560000000000{}}}. Then in CheckOverflow for 3.2.0 and 3.3.0 or in just the regular Multiply expression in 3.4.0 the setScale is called (as a part of Decimal.setPrecision). At that point the already rounded number is rounded yet again resulting in what is arguably a wrong answer by Spark. I have not fully tested this, but it looks like we could just remove the MathContext entirely in Decimal, or set it to UNLIMITED. All of the decimal operations appear to have their own overflow and rounding anyways. If we want to potentially reduce the total memory usage, we could also set the max precision to 39 and truncate (round down) the result in the math context instead. That would then let us round the result correctly in setPrecision afterwards. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org