[
https://issues.apache.org/jira/browse/SPARK-12173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343212#comment-15343212
]
Sun Rui commented on SPARK-12173:
---------------------------------
[~rxin] yes R don't need compile time type safety, but map/reduce functions are
popular in R, for example lapply() applies a function to each item of a list or
vector. For now, sparkR support spark.lapply() similar to lapply(). The
internal implementation internally depends on RDD. We could change the
implementation to use Dataset but not exposing Dataset API, something like:
change the R vector/list to a Dataset
call Dataset functions on it
Collect the result back as R vector/list
Not exposing Dataset API means SparkR does not provides distributed vector/list
abstraction, SparkR users have to use DataFrame for distributed vector/list ,
which seems is not convenient to R users.
[~shivaram] what do you think?
> Consider supporting DataSet API in SparkR
> -----------------------------------------
>
> Key: SPARK-12173
> URL: https://issues.apache.org/jira/browse/SPARK-12173
> Project: Spark
> Issue Type: Sub-task
> Components: SparkR
> Reporter: Felix Cheung
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]