[ 
https://issues.apache.org/jira/browse/SPARK-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720845#comment-14720845
 ] 

Shivaram Venkataraman commented on SPARK-6817:
----------------------------------------------

The idea behind having `dapplyCollect` was that it might be easier to implement 
as the output doesn't necessarily need to be converted to a valid Spark 
DataFrame on the JVM and could instead be just any R data frame. But I agree 
that adding more keywords is confusing for users and we could avoid this in a 
couple of ways
(1) implement the type conversion from R to JVM first so we wouldn't need this 
(2) have a slightly different class on the JVM that only supports collect on it 
(i.e. not a DataFrame) and use that to 

Regarding gapply -- SparkR (and dplyr) already have a `group_by` function that 
does the grouping and in SparkR this returns a `GroupedData` object. Right now 
the only function available on the `GroupedData` object is `agg` to perform 
aggregations on it. We could instead support `dapply` on `GroupedData` objects 
and then the syntax would be something like

grouped_df <- group_by(df, df$city)
collect(dapply(grouped_df, function(group) {} ))

cc [~rxin]

> DataFrame UDFs in R
> -------------------
>
>                 Key: SPARK-6817
>                 URL: https://issues.apache.org/jira/browse/SPARK-6817
>             Project: Spark
>          Issue Type: New Feature
>          Components: SparkR, SQL
>            Reporter: Shivaram Venkataraman
>
> This depends on some internal interface of Spark SQL, should be done after 
> merging into Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to