[
https://issues.apache.org/jira/browse/SPARK-24401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Machado updated SPARK-24401:
----------------------------------
Description:
Hi,
I think I found a really ugly bug in spark when performing aggregations with
Decimals
To reproduce:
{code:java}
val df = spark.read.parquet("attached file")
val first_agg = fact_df.groupBy("id1", "id2",
"start_date").agg(mean("projection_factor").alias("projection_factor"))
first_agg.show
val second_agg =
first_agg.groupBy("id1","id2").agg(max("projection_factor").alias("maxf"),
min("projection_factor").alias("minf"))
second_agg.show
{code}
First aggregation works fine the second aggregation seems to be summing instead
of max value. I tried with spark 2.2.0 and 2.3.0 same problem.
The dataset as circa 800 Rows and the projection_factor has values from 0 until
100. the result should not be bigger that 5 but with get 265820543091454.... as
result back.
As Code not 100% the same but I think there is really a bug there:
{code:java}
BigDecimal [] objects = new BigDecimal[]{
new BigDecimal(3.5714285714D),
new BigDecimal(3.5714285714D),
new BigDecimal(3.5714285714D),
new BigDecimal(3.5714285714D)};
Row dataRow = new GenericRow(objects);
Row dataRow2 = new GenericRow(objects);
StructType structType = new StructType()
.add("id1", DataTypes.createDecimalType(38,10), true)
.add("id2", DataTypes.createDecimalType(38,10), true)
.add("id3", DataTypes.createDecimalType(38,10), true)
.add("id4", DataTypes.createDecimalType(38,10), true);
final Dataset<Row> dataFrame =
sparkSession.createDataFrame(Arrays.asList(dataRow,dataRow2), structType);
System.out.println(dataFrame.schema());
dataFrame.show();
final Dataset<Row> df1 = dataFrame.groupBy("id1","id2")
.agg( mean("id3").alias("projection_factor"));
df1.show();
final Dataset<Row> df2 = df1
.groupBy("id1")
.agg(max("projection_factor"));
df2.show();
{code}
was:
Hi,
I think I found a really ugly bug in spark when performing aggregations with
Decimals
To reproduce:
{code:java}
val df = spark.read.parquet("attached file")
val first_agg = fact_df.groupBy("id1", "id2",
"start_date").agg(mean("projection_factor").alias("projection_factor"))
first_agg.show
val second_agg =
first_agg.groupBy("id1","id2").agg(max("projection_factor").alias("maxf"),
min("projection_factor").alias("minf"))
second_agg.show
{code}
First aggregation works fine the second aggregation seems to be summing instead
of max value. I tried with spark 2.2.0 and 2.3.0 same problem.
The dataset as circa 800 Rows and the projection_factor has values from 0 until
100. the result should not be bigger that 5 but with get 265820543091454.... as
result back.
> Aggreate on Decimal Types does not work
> ---------------------------------------
>
> Key: SPARK-24401
> URL: https://issues.apache.org/jira/browse/SPARK-24401
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0, 2.3.0
> Reporter: Jorge Machado
> Priority: Major
> Attachments: testDF.parquet
>
>
> Hi,
> I think I found a really ugly bug in spark when performing aggregations with
> Decimals
> To reproduce:
>
> {code:java}
> val df = spark.read.parquet("attached file")
> val first_agg = fact_df.groupBy("id1", "id2",
> "start_date").agg(mean("projection_factor").alias("projection_factor"))
> first_agg.show
> val second_agg =
> first_agg.groupBy("id1","id2").agg(max("projection_factor").alias("maxf"),
> min("projection_factor").alias("minf"))
> second_agg.show
> {code}
> First aggregation works fine the second aggregation seems to be summing
> instead of max value. I tried with spark 2.2.0 and 2.3.0 same problem.
> The dataset as circa 800 Rows and the projection_factor has values from 0
> until 100. the result should not be bigger that 5 but with get
> 265820543091454.... as result back.
>
>
> As Code not 100% the same but I think there is really a bug there:
>
> {code:java}
> BigDecimal [] objects = new BigDecimal[]{
> new BigDecimal(3.5714285714D),
> new BigDecimal(3.5714285714D),
> new BigDecimal(3.5714285714D),
> new BigDecimal(3.5714285714D)};
> Row dataRow = new GenericRow(objects);
> Row dataRow2 = new GenericRow(objects);
> StructType structType = new StructType()
> .add("id1", DataTypes.createDecimalType(38,10), true)
> .add("id2", DataTypes.createDecimalType(38,10), true)
> .add("id3", DataTypes.createDecimalType(38,10), true)
> .add("id4", DataTypes.createDecimalType(38,10), true);
> final Dataset<Row> dataFrame =
> sparkSession.createDataFrame(Arrays.asList(dataRow,dataRow2), structType);
> System.out.println(dataFrame.schema());
> dataFrame.show();
> final Dataset<Row> df1 = dataFrame.groupBy("id1","id2")
> .agg( mean("id3").alias("projection_factor"));
> df1.show();
> final Dataset<Row> df2 = df1
> .groupBy("id1")
> .agg(max("projection_factor"));
> df2.show();
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]