Re: Force Partitioner to use entire entry of PairRDD as key

2016-02-22 Thread Jay Luan
Thank you, that helps a lot.

On Mon, Feb 22, 2016 at 6:01 PM, Takeshi Yamamuro 
wrote:

> You're correct, reduceByKey is just an example.
>
> On Tue, Feb 23, 2016 at 10:57 AM, Jay Luan  wrote:
>
>> Could you elaborate on how this would work?
>>
>> So from what I can tell, this maps a key to a tuple which always has a 0
>> as the second element. From there the hash widely changes because we now
>> hash something like ((1,4), 0) and ((1,3), 0). Thus mapping this would
>> create more even partitions. Why reduce by key after? Is that just an
>> example of an operation that can be done? Or does it provide some kind of
>> real value to the operation.
>>
>>
>>
>> On Mon, Feb 22, 2016 at 5:48 PM, Takeshi Yamamuro 
>> wrote:
>>
>>> Hi,
>>>
>>> How about adding dummy values?
>>> values.map(d => (d, 0)).reduceByKey(_ + _)
>>>
>>> On Tue, Feb 23, 2016 at 10:15 AM, jluan  wrote:
>>>
 I was wondering, is there a way to force something like the hash
 partitioner
 to use the entire entry of a PairRDD as a hash rather than just the key?

 For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3),
 (2,
 3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the
 partitioner to hash the entire tuple such as (1,4)?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>


Re: Force Partitioner to use entire entry of PairRDD as key

2016-02-22 Thread Takeshi Yamamuro
You're correct, reduceByKey is just an example.

On Tue, Feb 23, 2016 at 10:57 AM, Jay Luan  wrote:

> Could you elaborate on how this would work?
>
> So from what I can tell, this maps a key to a tuple which always has a 0
> as the second element. From there the hash widely changes because we now
> hash something like ((1,4), 0) and ((1,3), 0). Thus mapping this would
> create more even partitions. Why reduce by key after? Is that just an
> example of an operation that can be done? Or does it provide some kind of
> real value to the operation.
>
>
>
> On Mon, Feb 22, 2016 at 5:48 PM, Takeshi Yamamuro 
> wrote:
>
>> Hi,
>>
>> How about adding dummy values?
>> values.map(d => (d, 0)).reduceByKey(_ + _)
>>
>> On Tue, Feb 23, 2016 at 10:15 AM, jluan  wrote:
>>
>>> I was wondering, is there a way to force something like the hash
>>> partitioner
>>> to use the entire entry of a PairRDD as a hash rather than just the key?
>>>
>>> For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3), (2,
>>> 3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the
>>> partitioner to hash the entire tuple such as (1,4)?
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro


Re: Force Partitioner to use entire entry of PairRDD as key

2016-02-22 Thread Jay Luan
Could you elaborate on how this would work?

So from what I can tell, this maps a key to a tuple which always has a 0 as
the second element. From there the hash widely changes because we now hash
something like ((1,4), 0) and ((1,3), 0). Thus mapping this would create
more even partitions. Why reduce by key after? Is that just an example of
an operation that can be done? Or does it provide some kind of real value
to the operation.



On Mon, Feb 22, 2016 at 5:48 PM, Takeshi Yamamuro 
wrote:

> Hi,
>
> How about adding dummy values?
> values.map(d => (d, 0)).reduceByKey(_ + _)
>
> On Tue, Feb 23, 2016 at 10:15 AM, jluan  wrote:
>
>> I was wondering, is there a way to force something like the hash
>> partitioner
>> to use the entire entry of a PairRDD as a hash rather than just the key?
>>
>> For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3), (2,
>> 3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the
>> partitioner to hash the entire tuple such as (1,4)?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>


Re: Force Partitioner to use entire entry of PairRDD as key

2016-02-22 Thread Takeshi Yamamuro
Hi,

How about adding dummy values?
values.map(d => (d, 0)).reduceByKey(_ + _)

On Tue, Feb 23, 2016 at 10:15 AM, jluan  wrote:

> I was wondering, is there a way to force something like the hash
> partitioner
> to use the entire entry of a PairRDD as a hash rather than just the key?
>
> For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3), (2,
> 3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the
> partitioner to hash the entire tuple such as (1,4)?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro


Force Partitioner to use entire entry of PairRDD as key

2016-02-22 Thread jluan
I was wondering, is there a way to force something like the hash partitioner
to use the entire entry of a PairRDD as a hash rather than just the key?

For Example, if we have an RDD with values: PairRDD = [(1,4), (1, 3), (2,
3), (2,5), (2, 10)]. Rather than using keys 1 and 2, can we force the
partitioner to hash the entire tuple such as (1,4)?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Force-Partitioner-to-use-entire-entry-of-PairRDD-as-key-tp26299.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org