[
https://issues.apache.org/jira/browse/SPARK-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341727#comment-14341727
]
Sean Owen commented on SPARK-3312:
----------------------------------
Interesting, is the reduce / max / min in question here by key? We have the
{{stats()}} method for RDDs of {{Double}} already to take care of this for a
whole RDD. Rather than add an API method for the by-key case, it's possible to
use {{StatCounter}} to compute all of these at once over a bunch of values that
have been collected by key. Does that do the trick or is this something more?
> Add a groupByKey which returns a special GroupBy object like in pandas
> ----------------------------------------------------------------------
>
> Key: SPARK-3312
> URL: https://issues.apache.org/jira/browse/SPARK-3312
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: holdenk
> Priority: Minor
>
> A common pattern which causes problems for new Spark users is using
> groupByKey followed by a reduce. I'd like to make a special version of
> groupByKey which returns a groupBy object (like the Panda's groupby object).
> The resulting class would have a number of functions (min,max, stats, reduce)
> which could all be implemented efficiently.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]