[jira] [Commented] (SPARK-4640) FixedRangePartitioner for partitioning items with a known range

Kevin Mader (JIRA) Thu, 27 Nov 2014 07:37:20 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227754#comment-14227754
 ]


Kevin Mader commented on SPARK-4640:
------------------------------------

I have code for both, that I could merge in, if there is interest.

> FixedRangePartitioner for partitioning items with a known range
> ---------------------------------------------------------------
>
>                 Key: SPARK-4640
>                 URL: https://issues.apache.org/jira/browse/SPARK-4640
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Kevin Mader
>
> For the large datasets I work with, it is common to have light-weight keys 
> and very heavy values (integers and large double arrays for example). The key 
> values are however known and unchanging. It would be nice if Spark had a 
> built in partitioner which could take advantage of this. A 
> FixedRangePartitioner[T](keys: Seq[T], partitions: Int) would be ideal. 
> Furthermore this partitioner type could be extended to a 
> PartitionerWithKnownKeys that had a getAllKeys function allowing for a list 
> of keys to be obtained without querying through the entire RDD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-4640) FixedRangePartitioner for partitioning items with a known range

Reply via email to