[ 
https://issues.apache.org/jira/browse/SPARK-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261613#comment-14261613
 ] 

Reynold Xin commented on SPARK-2247:
------------------------------------

I took another look at various data frame implementations out there and 
SchemaRDD. As it is right now, SchemaRDD along with its DSLs already delivers 
most of the basic functionalities of a data frame (e.g. you can read data into 
create new SchemaRDDs, you can project a SchemaRDD, you can run aggregations, 
and you can even join different SchemaRDDs).

The problem with the DSL is that it was originally designed for writing test 
cases so its various functions are often more verbose than needed. There also 
lacks a Python variant of the DSL. We should probably just update the DSL API 
to make it more useable and call it a day. Then for people programming for 
structured data, they can use the SchemaRDD DSL instead of directly using the 
RDD API. 

As for R, it should be part of SparkR.

> Data frame (or Pandas) like API for structured data
> ---------------------------------------------------
>
>                 Key: SPARK-2247
>                 URL: https://issues.apache.org/jira/browse/SPARK-2247
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark, Spark Core, SQL
>    Affects Versions: 1.0.0
>            Reporter: venu k tangirala
>              Labels: features
>
> I would be nice to have R or python pandas like data frames on spark.
> 1) To be able to access the RDD data frame from python with pandas 
> 2) To be able to access the RDD data frame from R 
> 3) To be able to access the RDD data frame from scala's saddle 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to