[jira] [Comment Edited] (SPARK-12922) Implement gapply() on DataFrame in SparkR

Narine Kokhlikyan (JIRA) Tue, 28 Jun 2016 08:04:38 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353142#comment-15353142
 ]


Narine Kokhlikyan edited comment on SPARK-12922 at 6/28/16 3:03 PM:
--------------------------------------------------------------------

Thank you [~timhunter] for sharing this information with us.
It is a nice idea. I think that it could be seen as an extension of current 
gapply's implementation.

 I think that, in general, whether the keys are useful or not depends on the 
use case. Most probably, the user, naturally, would like to see the matching 
key of each group-output and it would make sense to attach/append the keys by 
default.
If the user doesn't need the keys he or she can easily detach/drop those 
columns.


was (Author: narine):
Thank you [~timhunter] for sharing this information with us.
It is a nice idea. I think that it could be seen as an extension of current 
gapply's implementation.

In general, I think that whether the keys are useful or not depends on the use 
case. Most probably, the user, naturally, would like to see the matching key of 
each group-output and it would make sense to attach/append the keys by default.
If the user doesn't need the keys he or she can easily detach/drop those 
columns.

> Implement gapply() on DataFrame in SparkR
> -----------------------------------------
>
>                 Key: SPARK-12922
>                 URL: https://issues.apache.org/jira/browse/SPARK-12922
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>    Affects Versions: 1.6.0
>            Reporter: Sun Rui
>            Assignee: Narine Kokhlikyan
>             Fix For: 2.0.0
>
>
> gapply() applies an R function on groups grouped by one or more columns of a 
> DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() 
> in the Dataset API.
> Two API styles are supported:
> 1.
> {code}
> gd <- groupBy(df, col1, ...)
> gapply(gd, function(grouping_key, group) {}, schema)
> {code}
> 2.
> {code}
> gapply(df, grouping_columns, function(grouping_key, group) {}, schema) 
> {code}
> R function input: grouping keys value, a local data.frame of this grouped 
> data 
> R function output: local data.frame
> Schema specifies the Row format of the output of the R function. It must 
> match the R function's output.
> Note that map-side combination (partial aggregation) is not supported, user 
> could do map-side combination via dapply().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-12922) Implement gapply() on DataFrame in SparkR

Reply via email to