[
https://issues.apache.org/jira/browse/SPARK-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720845#comment-14720845
]
Shivaram Venkataraman commented on SPARK-6817:
----------------------------------------------
The idea behind having `dapplyCollect` was that it might be easier to implement
as the output doesn't necessarily need to be converted to a valid Spark
DataFrame on the JVM and could instead be just any R data frame. But I agree
that adding more keywords is confusing for users and we could avoid this in a
couple of ways
(1) implement the type conversion from R to JVM first so we wouldn't need this
(2) have a slightly different class on the JVM that only supports collect on it
(i.e. not a DataFrame) and use that to
Regarding gapply -- SparkR (and dplyr) already have a `group_by` function that
does the grouping and in SparkR this returns a `GroupedData` object. Right now
the only function available on the `GroupedData` object is `agg` to perform
aggregations on it. We could instead support `dapply` on `GroupedData` objects
and then the syntax would be something like
grouped_df <- group_by(df, df$city)
collect(dapply(grouped_df, function(group) {} ))
cc [~rxin]
> DataFrame UDFs in R
> -------------------
>
> Key: SPARK-6817
> URL: https://issues.apache.org/jira/browse/SPARK-6817
> Project: Spark
> Issue Type: New Feature
> Components: SparkR, SQL
> Reporter: Shivaram Venkataraman
>
> This depends on some internal interface of Spark SQL, should be done after
> merging into Spark.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]