[ https://issues.apache.org/jira/browse/SPARK-28610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marco Gaido resolved SPARK-28610. --------------------------------- Resolution: Won't Fix Since the perf regression introduced by the change would be very high, this won't be fixed. Thanks. > Support larger buffer for sum of long > ------------------------------------- > > Key: SPARK-28610 > URL: https://issues.apache.org/jira/browse/SPARK-28610 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Marco Gaido > Priority: Major > > The sum of a long field currently uses a buffer of type long. > When the flag for throwing exceptions on overflow for arithmetic operations > in turned on, this is a problem in case there are intermediate overflows > which are then resolved by other rows. Indeed, in such a case, we are > throwing an exception, while the result is representable in a long value. An > example of this issue can be seen running: > {code} > val df = sc.parallelize(Seq(100L, Long.MaxValue, -1000L)).toDF("a") > df.select(sum($"a")).show() > {code} > According to [~cloud_fan]'s suggestion in > https://github.com/apache/spark/pull/21599, we should introduce a flag in > order to let users choose among a wider datatype for the sum buffer using a > config, so that the above issue can be fixed. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org