[
https://issues.apache.org/jira/browse/SPARK-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227754#comment-14227754
]
Kevin Mader commented on SPARK-4640:
------------------------------------
I have code for both, that I could merge in, if there is interest.
> FixedRangePartitioner for partitioning items with a known range
> ---------------------------------------------------------------
>
> Key: SPARK-4640
> URL: https://issues.apache.org/jira/browse/SPARK-4640
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Kevin Mader
>
> For the large datasets I work with, it is common to have light-weight keys
> and very heavy values (integers and large double arrays for example). The key
> values are however known and unchanging. It would be nice if Spark had a
> built in partitioner which could take advantage of this. A
> FixedRangePartitioner[T](keys: Seq[T], partitions: Int) would be ideal.
> Furthermore this partitioner type could be extended to a
> PartitionerWithKnownKeys that had a getAllKeys function allowing for a list
> of keys to be obtained without querying through the entire RDD.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]