[jira] [Commented] (SPARK-13453) Including groupKey in the signature of aggregateByKey

Anton Anastasov (JIRA) Tue, 23 Feb 2016 06:28:41 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158955#comment-15158955
 ]


Anton Anastasov commented on SPARK-13453:
-----------------------------------------

I will get back to you with a concrete use-case where having the key during the 
aggregation allows for a performance optimization.

> Including groupKey in the signature of aggregateByKey
> -----------------------------------------------------
>
>                 Key: SPARK-13453
>                 URL: https://issues.apache.org/jira/browse/SPARK-13453
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Anton Anastasov
>            Priority: Minor
>
> The signature of the aggregateByKey method over PairRDDs does not provide 
> access to the actual key. My proposal is to create a new overloaded method 
> that includes the key. This is necessary when the aggregation depends on the 
> key.
> There is a workaround possible currently -- we can just map the PairRDD[K, V] 
> to PairRDD[K, (K, V)], but this seems convoluted. 
> Let me know what you think, and if I should go ahead with a pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-13453) Including groupKey in the signature of aggregateByKey

Reply via email to