[
https://issues.apache.org/jira/browse/SPARK-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353142#comment-15353142
]
Narine Kokhlikyan edited comment on SPARK-12922 at 6/28/16 3:03 PM:
--------------------------------------------------------------------
Thank you [~timhunter] for sharing this information with us.
It is a nice idea. I think that it could be seen as an extension of current
gapply's implementation.
I think that, in general, whether the keys are useful or not depends on the
use case. Most probably, the user, naturally, would like to see the matching
key of each group-output and it would make sense to attach/append the keys by
default.
If the user doesn't need the keys he or she can easily detach/drop those
columns.
was (Author: narine):
Thank you [~timhunter] for sharing this information with us.
It is a nice idea. I think that it could be seen as an extension of current
gapply's implementation.
In general, I think that whether the keys are useful or not depends on the use
case. Most probably, the user, naturally, would like to see the matching key of
each group-output and it would make sense to attach/append the keys by default.
If the user doesn't need the keys he or she can easily detach/drop those
columns.
> Implement gapply() on DataFrame in SparkR
> -----------------------------------------
>
> Key: SPARK-12922
> URL: https://issues.apache.org/jira/browse/SPARK-12922
> Project: Spark
> Issue Type: Sub-task
> Components: SparkR
> Affects Versions: 1.6.0
> Reporter: Sun Rui
> Assignee: Narine Kokhlikyan
> Fix For: 2.0.0
>
>
> gapply() applies an R function on groups grouped by one or more columns of a
> DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups()
> in the Dataset API.
> Two API styles are supported:
> 1.
> {code}
> gd <- groupBy(df, col1, ...)
> gapply(gd, function(grouping_key, group) {}, schema)
> {code}
> 2.
> {code}
> gapply(df, grouping_columns, function(grouping_key, group) {}, schema)
> {code}
> R function input: grouping keys value, a local data.frame of this grouped
> data
> R function output: local data.frame
> Schema specifies the Row format of the output of the R function. It must
> match the R function's output.
> Note that map-side combination (partial aggregation) is not supported, user
> could do map-side combination via dapply().
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]