Re: Force Partitioner to use entire entry of PairRDD as key

Jay Luan Mon, 22 Feb 2016 17:58:42 -0800

Could you elaborate on how this would work?

So from what I can tell, this maps a key to a tuple which always has a 0 as
the second element. From there the hash widely changes because we now hash
something like ((1,4), 0) and ((1,3), 0). Thus mapping this would create
more even partitions. Why reduce by key after? Is that just an example of
an operation that can be done? Or does it provide some kind of real value
to the operation.




On Mon, Feb 22, 2016 at 5:48 PM, Takeshi Yamamuro <linguin....@gmail.com>
wrote:

> Hi,
>
> How about adding dummy values?
> values.map(d => (d, 0)).reduceByKey(_ + _)
>
> On Tue, Feb 23, 2016 at 10:15 AM, jluan <jaylu...@gmail.com> wrote:
>
>> I was wondering, is there a way to force something like the hash
>> partitioner
>> to use the entire entry of a PairRDD as a hash rather than just the key?
>>
>> For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3), (2,
>> 3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the
>> partitioner to hash the entire tuple such as (1,4)?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Force Partitioner to use entire entry of PairRDD as key

Reply via email to