[
https://issues.apache.org/jira/browse/SPARK-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yin Huai updated SPARK-8972:
----------------------------
Assignee: Cheng Hao
> Incorrect result for rollup
> ---------------------------
>
> Key: SPARK-8972
> URL: https://issues.apache.org/jira/browse/SPARK-8972
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Cheng Hao
> Assignee: Cheng Hao
> Priority: Critical
> Fix For: 1.5.0
>
>
> {code:java}
> import sqlContext.implicits._
> case class KeyValue(key: Int, value: String)
> val df = sc.parallelize(1 to 5).map(i=>KeyValue(i, i.toString)).toDF
> df.registerTempTable("foo")
> sqlContext.sql("select count(*) as cnt, key % 100,GROUPING__ID from foo group
> by key%100 with rollup").show(100)
> // output
> +---+---+------------+
> |cnt|_c1|GROUPING__ID|
> +---+---+------------+
> | 1| 4| 0|
> | 1| 4| 1|
> | 1| 5| 0|
> | 1| 5| 1|
> | 1| 1| 0|
> | 1| 1| 1|
> | 1| 2| 0|
> | 1| 2| 1|
> | 1| 3| 0|
> | 1| 3| 1|
> +---+---+------------+
> {code}
> After checking with the code, seems we does't support the complex expressions
> (not just simple column names) for GROUP BY keys for rollup, as well as the
> cube. And it even will not report it if we have complex expression in the
> rollup keys, hence we get very confusing result as the example above.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]