[ 
https://issues.apache.org/jira/browse/SPARK-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278819#comment-14278819
 ] 

Hamel Ajay Kothari commented on SPARK-5097:
-------------------------------------------

Am I correct in interpreting that this would allow us to trivially select 
columns at runtime since we'd just use {{SchemaRDD(stringColumnName)}}? In the 
world of catalyst selecting columns known only at runtime was a real pain 
because the only defined way to do it in the docs was to use quasiquotes or use 
{{SchemaRDD.baseLogicalPlan.resolve()}}. The first couldn't be defined at 
runtime (as far as I know) and the second required you to depend on expressions.

Also, is there any way to control the name of the resulting columns from 
groupby+aggregate (or similar methods that add columns) in this plan?

> Adding data frame APIs to SchemaRDD
> -----------------------------------
>
>                 Key: SPARK-5097
>                 URL: https://issues.apache.org/jira/browse/SPARK-5097
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Critical
>         Attachments: DesignDocAddingDataFrameAPIstoSchemaRDD.pdf
>
>
> SchemaRDD, through its DSL, already provides common data frame 
> functionalities. However, the DSL was originally created for constructing 
> test cases without much end-user usability and API stability consideration. 
> This design doc proposes a set of API changes for Scala and Python to make 
> the SchemaRDD DSL API more usable and stable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to