[
https://issues.apache.org/jira/browse/SPARK-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012108#comment-14012108
]
Matei Zaharia commented on SPARK-1784:
--------------------------------------
As discussed on https://github.com/apache/spark/pull/727/files, round-robin
assignment does not fit the contract of Partitioner -- Partitioner is supposed
to consistently map each key to one partition ID. You can use RDD.repartition()
or coalesce() to get even partitioning for arbitrary datasets.
> Add a partitioner which partitions an RDD with each partition having
> specified # of keys
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-1784
> URL: https://issues.apache.org/jira/browse/SPARK-1784
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 0.9.0
> Reporter: Syed A. Hashmi
> Priority: Minor
>
> At times on mailing lists, I have seen people complaining about having no
> control over # of keys per partition. RangePartitioner partitions keys in to
> roughly equal sized partitions, but in cases where user wants full control
> over specifying exact size, it is not possible today.
--
This message was sent by Atlassian JIRA
(v6.2#6252)