[ https://issues.apache.org/jira/browse/SPARK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jorge Machado updated SPARK-24401: ---------------------------------- Description: Hi, I think I found a really ugly bug in spark when performing aggregations with Decimals To reproduce: {code:java} val df = spark.read.parquet("attached file") val first_agg = fact_df.groupBy("id1", "id2", "start_date").agg(mean("projection_factor").alias("projection_factor")) first_agg.show val second_agg = first_agg.groupBy("id1","id2").agg(max("projection_factor").alias("maxf"), min("projection_factor").alias("minf")) second_agg.show {code} First aggregation works fine the second aggregation seems to be summing instead of max value. I tried with spark 2.2.0 and 2.3.0 same problem. The dataset as circa 800 Rows and the projection_factor has values from 0 until 100. the result should not be bigger that 5 but with get 265820543091454.... as result back. was: Hi, I think I found a really ugly bug in spark when performing aggregations with Decimals To reproduce: {code:java} val df = spark.read.parquet("attached file") val first_agg = fact_df.groupBy("id1", "id2", "start_date").agg(mean("projection_factor").alias("projection_factor")) first_agg.show val second_agg = first_agg.groupBy("id1","id2").agg(max("projection_factor").alias("maxf"), min("projection_factor").alias("minf")) second_agg.show {code} First aggregation works fine the second aggregation seems to be summing instead of max value. I tried with spark 2.2.0 and 2.3.0 same problem. The dataset as circa 800 Rows and the projection_factor has values from 0 until 5. the result should not be bigger that 5 but with get 265820543091454.... as result back. > Aggreate on Decimal Types does not work > --------------------------------------- > > Key: SPARK-24401 > URL: https://issues.apache.org/jira/browse/SPARK-24401 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0, 2.3.0 > Reporter: Jorge Machado > Priority: Major > Attachments: testDF.parquet > > > Hi, > I think I found a really ugly bug in spark when performing aggregations with > Decimals > To reproduce: > > {code:java} > val df = spark.read.parquet("attached file") > val first_agg = fact_df.groupBy("id1", "id2", > "start_date").agg(mean("projection_factor").alias("projection_factor")) > first_agg.show > val second_agg = > first_agg.groupBy("id1","id2").agg(max("projection_factor").alias("maxf"), > min("projection_factor").alias("minf")) > second_agg.show > {code} > First aggregation works fine the second aggregation seems to be summing > instead of max value. I tried with spark 2.2.0 and 2.3.0 same problem. > The dataset as circa 800 Rows and the projection_factor has values from 0 > until 100. the result should not be bigger that 5 but with get > 265820543091454.... as result back. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org