[
https://issues.apache.org/jira/browse/SPARK-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158955#comment-15158955
]
Anton Anastasov commented on SPARK-13453:
-----------------------------------------
I will get back to you with a concrete use-case where having the key during the
aggregation allows for a performance optimization.
> Including groupKey in the signature of aggregateByKey
> -----------------------------------------------------
>
> Key: SPARK-13453
> URL: https://issues.apache.org/jira/browse/SPARK-13453
> Project: Spark
> Issue Type: Improvement
> Reporter: Anton Anastasov
> Priority: Minor
>
> The signature of the aggregateByKey method over PairRDDs does not provide
> access to the actual key. My proposal is to create a new overloaded method
> that includes the key. This is necessary when the aggregation depends on the
> key.
> There is a workaround possible currently -- we can just map the PairRDD[K, V]
> to PairRDD[K, (K, V)], but this seems convoluted.
> Let me know what you think, and if I should go ahead with a pull request.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]