[
https://issues.apache.org/jira/browse/SPARK-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin Huai updated SPARK-8712:
----------------------------
Summary: Hive's Parser does not support distinct aggregations with OVER
clause (was: Window function does not work with distinct aggregations)
> Hive's Parser does not support distinct aggregations with OVER clause
> ---------------------------------------------------------------------
>
> Key: SPARK-8712
> URL: https://issues.apache.org/jira/browse/SPARK-8712
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 1.4.0
> Reporter: Yin Huai
>
> Hive's parser ignores Window spec when a distinct aggregation is used.
> {code}
> scala> Seq((1, 2, 3)).toDF("i", "j", "k").registerTempTable("t")
> scala> sql("select count(distinct j) over (partition by i) from
> t").explain(true)
> == Parsed Logical Plan ==
> 'Project [UnresolvedAlias
> CountDistinct
> UnresolvedAttribute [j]
> ]
> 'UnresolvedRelation [t], None
> == Analyzed Logical Plan ==
> _c0: bigint
> Aggregate [COUNT(DISTINCT j#23) AS _c0#27L]
> Subquery t
> Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24]
> LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]]
> == Optimized Logical Plan ==
> Aggregate [COUNT(DISTINCT j#23) AS _c0#27L]
> LocalRelation [j#23], [[2]]
> == Physical Plan ==
> GeneratedAggregate false, [CombineAndCount(partialSets#28) AS _c0#27L], false
> Exchange SinglePartition
> GeneratedAggregate true, [AddToHashSet(j#23) AS partialSets#28], false
> LocalTableScan [j#23], [[2]]
> Code Generation: true
> == RDD ==
> scala> sql("select count(j) over (partition by i) from t").explain(true)
> == Parsed Logical Plan ==
> 'Project [UnresolvedAlias
> WindowExpression
> UnresolvedWindowFunction count
> UnresolvedAttribute [j]
> WindowSpecDefinition UnspecifiedFrame
> UnresolvedAttribute [i]
> ]
> 'UnresolvedRelation [t], None
> == Analyzed Logical Plan ==
> _c0: bigint
> Project [_c0#31L]
> Project [j#23,i#22,_c0#31L,_c0#31L]
> Window [j#23,i#22],
> [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23)
> WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED
> FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING
> AND UNBOUNDED FOLLOWING
> Project [j#23,i#22]
> Subquery t
> Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24]
> LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]]
> == Optimized Logical Plan ==
> Project [_c0#31L]
> Window [j#23,i#22],
> [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23)
> WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED
> FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING
> AND UNBOUNDED FOLLOWING
> LocalRelation [j#23,i#22], [[2,1]]
> == Physical Plan ==
> Project [_c0#31L]
> Window [j#23,i#22],
> [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23)
> WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED
> FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING
> AND UNBOUNDED FOLLOWING
> ExternalSort [i#22 ASC], false
> Exchange (HashPartitioning 200)
> LocalTableScan [j#23,i#22], [[2,1]]
> Code Generation: true
> == RDD ==
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]