Re: How to join two RDDs with mutually exclusive keys

Daniel Haviv Thu, 20 Nov 2014 14:12:21 -0800

There is probably a better way to do it but I would register both as temp 
tables and then join them via SQL.


BR,
Daniel

> On 20 בנוב׳ 2014, at 23:53, Harihar Nahak <hna...@wynyardgroup.com> wrote:
> 
> I've similar type of issue, want to join two different type of RDD in one RDD 
>  
> 
> file1.txt content (ID, counts) 
> val x : RDD[Long, Int] = sc.textFile("file1.txt").map( line => 
> line.split(",")).map(row => (row(0).toLong, row(1).toInt)
> [(4407 ,40), 
> (2064, 38),
> (7815 ,10),
> (5736,17), 
> (8031,3)]
> 
> Second RDD from : file2.txt contains (ID, name)    
> val y: RDD[(Long, String)]    {where ID is common in both the RDDs}
> [(4407 ,Jhon), 
> (2064, Maria),
> (7815 ,Casto),
> (5736,Ram), 
> (8031,XYZ)]
> 
> and I'm expecting result should be like this : [(ID, Name, Count)]
> [(4407 ,Jhon, 40), 
> (2064, Maria, 38),
> (7815 ,Casto, 10),
> (5736,Ram, 17), 
> (8031,XYZ, 3)]
> 
> 
> Any help will really appreciate. Thanks 
> 
> 
>  
> 
>> On 21 November 2014 09:18, dsiegmann [via Apache Spark User List] <[hidden 
>> email]> wrote:
>> You want to use RDD.union (or SparkContext.union for many RDDs). These don't 
>> join on a key. Union doesn't really do anything itself, so it is low 
>> overhead. Note that the combined RDD will have all the partitions of the 
>> original RDDs, so you may want to coalesce after the union.
>> 
>> val x = sc.parallelize(Seq( (1, 3), (2, 4) ))
>> val y = sc.parallelize(Seq( (3, 5), (4, 7) ))
>> val z = x.union(y)
>> 
>> z.collect
>> res0: Array[(Int, Int)] = Array((1,3), (2,4), (3,5), (4,7))
>> 
>> 
>>> On Thu, Nov 20, 2014 at 3:06 PM, Blind Faith <[hidden email]> wrote:
>>> Say I have two RDDs with the following values
>>> 
>>> x = [(1, 3), (2, 4)]
>>> and
>>> 
>>> y = [(3, 5), (4, 7)]
>>> and I want to have
>>> 
>>> z = [(1, 3), (2, 4), (3, 5), (4, 7)]
>>> How can I achieve this. I know you can use outerJoin followed by map to 
>>> achieve this, but is there a more direct way for this.
>>> 
>> 
>> 
>> 
>> -- 
>> Daniel Siegmann, Software Developer
>> Velos
>> Accelerating Machine Learning
>> 
>> 54 W 40th St, New York, NY 10018
>> E: [hidden email] W: www.velos.io
>> 
>> 
>> If you reply to this email, your message will be added to the discussion 
>> below:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys-tp19417p19419.html
>> To start a new topic under Apache Spark User List, email [hidden email] 
>> To unsubscribe from Apache Spark User List, click here.
>> NAML
> 
> 
> 
> -- 
> Regards,
> Harihar Nahak
> BigData Developer  
> Wynyard 
> [hidden email] | Extn: 8019
> --Harihar
> 
> View this message in context: Re: How to join two RDDs with mutually 
> exclusive keys
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to join two RDDs with mutually exclusive keys

Reply via email to