[
https://issues.apache.org/jira/browse/SPARK-17758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin Huai resolved SPARK-17758.
------------------------------
Resolution: Fixed
Fix Version/s: 2.1.0
2.0.2
Issue resolved by pull request 15348
[https://github.com/apache/spark/pull/15348]
> Spark Aggregate function LAST returns null on an empty partition
> ------------------------------------------------------------------
>
> Key: SPARK-17758
> URL: https://issues.apache.org/jira/browse/SPARK-17758
> Project: Spark
> Issue Type: Bug
> Components: Input/Output
> Affects Versions: 2.0.0
> Environment: Spark 2.0.0
> Reporter: Franck Tago
> Labels: correctness
> Fix For: 2.0.2, 2.1.0
>
>
> My Environment
> Spark 2.0.0
> I have included the physical plan of my application below.
> Issue description
> The result from a query that uses the LAST function are incorrect.
> The output obtained for the column that corresponds to the last function is
> null .
> My input data contain 3 rows .
> The application resulted in 2 stages
> The first stage consisted of 3 tasks .
> The first task/partition contains 2 rows
> The second task/partition contains 1 row
> The last task/partition contain 0 rows
> The result from the query executed for the LAST column call is NULL which I
> believe is due to the PARTIAL_LAST on the last partition .
> I believe that this behavior is incorrect. The PARTIAL_LAST call on an empty
> partition should not return null .
> {noformat}
> == Physical Plan ==
> InsertIntoHiveTable MetastoreRelation default, bdm_3449_tgt20, true, false
> +- *Project [last(C3_1)#51 AS field#102, cast(round(max(C3_0)#50, 0) as int)
> AS field1#103, cast(round(max(C3_0)#50, 0) as int) AS field2#104]
> +- SortAggregate(key=[], functions=[max(C3_0#40),last(C3_1#41, false)],
> output=[max(C3_0)#50,last(C3_1)#51])
> +- SortAggregate(key=[],
> functions=[partial_max(C3_0#40),partial_last(C3_1#41, false)],
> output=[max#91,last#92])
> +- *Project [CAST(sum(C1_0) AS DOUBLE)#27 AS C3_0#40, last(C1_1)#28
> AS C3_1#41]
> +- SortAggregate(key=[], functions=[sum(cast(C1_0#17 as
> bigint)),last(C1_1#18, false)], output=[CAST(sum(C1_0) AS
> DOUBLE)#27,last(C1_1)#28])
> +- Exchange SinglePartition
> +- SortAggregate(key=[],
> functions=[partial_sum(cast(C1_0#17 as bigint)),partial_last(C1_1#18,
> false)], output=[sum#95L,last#96])
> +- *Project [field1#7 AS C1_0#17, field#6 AS C1_1#18]
> +- HiveTableScan [field1#7, field#6],
> MetastoreRelation default, bdm_3449_src, alias
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]