[jira] [Commented] (SPARK-1784) Add a partitioner which partitions an RDD with each partition having specified # of keys

Matei Zaharia (JIRA) Wed, 28 May 2014 23:18:24 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012108#comment-14012108
 ]


Matei Zaharia commented on SPARK-1784:
--------------------------------------

As discussed on https://github.com/apache/spark/pull/727/files, round-robin 
assignment does not fit the contract of Partitioner -- Partitioner is supposed 
to consistently map each key to one partition ID. You can use RDD.repartition() 
or coalesce() to get even partitioning for arbitrary datasets.

> Add a partitioner which partitions an RDD with each partition having 
> specified # of keys
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-1784
>                 URL: https://issues.apache.org/jira/browse/SPARK-1784
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 0.9.0
>            Reporter: Syed A. Hashmi
>            Priority: Minor
>
> At times on mailing lists, I have seen people complaining about having no 
> control over # of keys per partition. RangePartitioner partitions keys in to 
> roughly equal sized partitions, but in cases where user wants full control 
> over specifying exact size, it is not possible today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-1784) Add a partitioner which partitions an RDD with each partition having specified # of keys

Reply via email to