[ https://issues.apache.org/jira/browse/SPARK-24362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488464#comment-16488464 ]
Yuming Wang commented on SPARK-24362: ------------------------------------- *{{SortMergeJoin}}* vs *{{BroadcastHashJoin}}*: {code} test("SPARK-24362") { val df = spark.range(6).toDF("c1") withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") { df.join(df, "c1").selectExpr("sum(cast(9.99 as double))").show() } withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "100000") { df.join(df, "c1").selectExpr("sum(cast(9.99 as double))").show() } } {code} Results: {noformat} +------------------+ | smj| +------------------+ |59.940000000000005| +------------------+ +-----+ | bhj| +-----+ |59.94| +-----+ {noformat} > SUM function precision issue > ---------------------------- > > Key: SPARK-24362 > URL: https://issues.apache.org/jira/browse/SPARK-24362 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Yuming Wang > Priority: Major > > How to reproduce: > {noformat} > bin/spark-shell --conf spark.sql.autoBroadcastJoinThreshold=-1 > scala> val df = spark.range(6).toDF("c1") > df: org.apache.spark.sql.DataFrame = [c1: bigint] > scala> df.join(df, "c1").selectExpr("sum(cast(9.99 as double))").show() > +-------------------------+ > |sum(CAST(9.99 AS DOUBLE))| > +-------------------------+ > | 59.940000000000005| > +-------------------------+{noformat} > > More links: > [https://stackoverflow.com/questions/42158844/about-a-loss-of-precision-when-calculating-an-aggregate-sum-with-data-frames] > [https://stackoverflow.com/questions/44134497/spark-sql-sum-function-issues-on-double-value] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org