[ https://issues.apache.org/jira/browse/SPARK-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278819#comment-14278819 ]
Hamel Ajay Kothari commented on SPARK-5097: ------------------------------------------- Am I correct in interpreting that this would allow us to trivially select columns at runtime since we'd just use {{SchemaRDD(stringColumnName)}}? In the world of catalyst selecting columns known only at runtime was a real pain because the only defined way to do it in the docs was to use quasiquotes or use {{SchemaRDD.baseLogicalPlan.resolve()}}. The first couldn't be defined at runtime (as far as I know) and the second required you to depend on expressions. Also, is there any way to control the name of the resulting columns from groupby+aggregate (or similar methods that add columns) in this plan? > Adding data frame APIs to SchemaRDD > ----------------------------------- > > Key: SPARK-5097 > URL: https://issues.apache.org/jira/browse/SPARK-5097 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Reynold Xin > Assignee: Reynold Xin > Priority: Critical > Attachments: DesignDocAddingDataFrameAPIstoSchemaRDD.pdf > > > SchemaRDD, through its DSL, already provides common data frame > functionalities. However, the DSL was originally created for constructing > test cases without much end-user usability and API stability consideration. > This design doc proposes a set of API changes for Scala and Python to make > the SchemaRDD DSL API more usable and stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org