[ 
https://issues.apache.org/jira/browse/SPARK-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Hao updated SPARK-8972:
-----------------------------
    Description: 
{code:java}
import sqlContext.implicits._
case class KeyValue(key: Int, value: String)
val df = sc.parallelize(1 to 5).map(i=>KeyValue(i, i.toString)).toDF
df.registerTempTable("foo")
sqlContext.sql("select count(*) as cnt, key % 100,GROUPING__ID from foo group 
by key%100 with rollup").show(100)
// output
+---+---+------------+
|cnt|_c1|GROUPING__ID|
+---+---+------------+
|  1|  4|           0|
|  1|  4|           1|
|  1|  5|           0|
|  1|  5|           1|
|  1|  1|           0|
|  1|  1|           1|
|  1|  2|           0|
|  1|  2|           1|
|  1|  3|           0|
|  1|  3|           1|
+---+---+------------+
{code}
After checking with the code, seems we does't support the complex expressions 
(not just simple column names) for GROUP BY keys for rollup, as well as the 
cube. And it even will not report it if we have complex expression in the 
rollup keys, hence we get very confusing result as the example above.

  was:
{code:java}
import sqlContext.implicits._
case class KeyValue(key: Int, value: String)
val df = sc.parallelize(1 to 5).map(i=>KeyValue(i, i.toString)).toDF
df.registerTempTable("foo")
sqlContext.sql("select count(*) as cnt, key % 100,GROUPING__ID from foo group 
by key%100 with rollup").show(100)
// output
+---+---+------------+
|cnt|_c1|GROUPING__ID|
+---+---+------------+
|  1|  4|           0|
|  1|  4|           1|
|  1|  5|           0|
|  1|  5|           1|
|  1|  1|           0|
|  1|  1|           1|
|  1|  2|           0|
|  1|  2|           1|
|  1|  3|           0|
|  1|  3|           1|
+---+---+------------+
{code}


> Incorrect result for rollup
> ---------------------------
>
>                 Key: SPARK-8972
>                 URL: https://issues.apache.org/jira/browse/SPARK-8972
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Cheng Hao
>            Priority: Critical
>
> {code:java}
> import sqlContext.implicits._
> case class KeyValue(key: Int, value: String)
> val df = sc.parallelize(1 to 5).map(i=>KeyValue(i, i.toString)).toDF
> df.registerTempTable("foo")
> sqlContext.sql("select count(*) as cnt, key % 100,GROUPING__ID from foo group 
> by key%100 with rollup").show(100)
> // output
> +---+---+------------+
> |cnt|_c1|GROUPING__ID|
> +---+---+------------+
> |  1|  4|           0|
> |  1|  4|           1|
> |  1|  5|           0|
> |  1|  5|           1|
> |  1|  1|           0|
> |  1|  1|           1|
> |  1|  2|           0|
> |  1|  2|           1|
> |  1|  3|           0|
> |  1|  3|           1|
> +---+---+------------+
> {code}
> After checking with the code, seems we does't support the complex expressions 
> (not just simple column names) for GROUP BY keys for rollup, as well as the 
> cube. And it even will not report it if we have complex expression in the 
> rollup keys, hence we get very confusing result as the example above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to