Kevin Mader created SPARK-4640:
----------------------------------
Summary: FixedRangePartitioner for partitioning items with a known
range
Key: SPARK-4640
URL: https://issues.apache.org/jira/browse/SPARK-4640
Project: Spark
Issue Type: Improvement
Components: Spark Core
Reporter: Kevin Mader
For the large datasets I work with, it is common to have light-weight keys and
very heavy values (integers and large double arrays for example). The key
values are however known and unchanging. It would be nice if Spark had a built
in partitioner which could take advantage of this. A
FixedRangePartitioner[T](keys: Seq[T], partitions: Int) would be ideal.
Furthermore this partitioner type could be extended to a
PartitionerWithKnownKeys that had a getAllKeys function allowing for a list of
keys to be obtained without querying through the entire RDD.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]