[jira] [Commented] (SPARK-12173) Consider supporting DataSet API in SparkR

Sun Rui (JIRA) Tue, 21 Jun 2016 19:28:30 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-12173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343212#comment-15343212
 ]


Sun Rui commented on SPARK-12173:
---------------------------------

[~rxin] yes R don't need compile time type safety, but map/reduce functions are 
popular in R, for example lapply() applies a function to each item of a list or 
vector. For now, sparkR support spark.lapply() similar to lapply(). The 
internal implementation internally depends on RDD. We could change the 
implementation to use Dataset but not exposing Dataset API, something like:
   change the R vector/list to a Dataset
   call Dataset functions on it
   Collect the result back as R vector/list
Not exposing Dataset API means SparkR does not provides distributed vector/list 
abstraction, SparkR users have to use DataFrame for distributed vector/list , 
which seems is not convenient to R users. 
[~shivaram] what do you think?

> Consider supporting DataSet API in SparkR
> -----------------------------------------
>
>                 Key: SPARK-12173
>                 URL: https://issues.apache.org/jira/browse/SPARK-12173
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>            Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12173) Consider supporting DataSet API in SparkR

Reply via email to