[
https://issues.apache.org/jira/browse/SPARK-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Armbrust resolved SPARK-4564.
-------------------------------------
Resolution: Won't Fix
I'm going to close this wontfix unless there is major objection. Happy to
accept PRs to clarify the documentation though :)
> SchemaRDD.groupBy(groupingExprs)(aggregateExprs) doesn't return the
> groupingExprs as part of the output schema
> --------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-4564
> URL: https://issues.apache.org/jira/browse/SPARK-4564
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.1.0
> Environment: Mac OSX, local mode, but should hold true for all
> environments
> Reporter: Dean Wampler
>
> In the following example, I would expect the "grouped" schema to contain two
> fields, the String name and the Long count, but it only contains the Long
> count.
> {code}
> // Assumes val sc = new SparkContext(...), e.g., in Spark Shell
> import org.apache.spark.sql.{SQLContext, SchemaRDD}
> import org.apache.spark.sql.catalyst.expressions._
> val sqlc = new SQLContext(sc)
> import sqlc._
> case class Record(name: String, n: Int)
> val records = List(
> Record("three", 1),
> Record("three", 2),
> Record("two", 3),
> Record("three", 4),
> Record("two", 5))
> val recs = sc.parallelize(records)
> recs.registerTempTable("records")
> val grouped = recs.select('name, 'n).groupBy('name)(Count('n) as 'count)
> grouped.printSchema
> // root
> // |-- count: long (nullable = false)
> grouped foreach println
> // [2]
> // [3]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]