There is probably a better way to do it but I would register both as temp tables and then join them via SQL.
BR, Daniel > On 20 בנוב׳ 2014, at 23:53, Harihar Nahak <hna...@wynyardgroup.com> wrote: > > I've similar type of issue, want to join two different type of RDD in one RDD > > > file1.txt content (ID, counts) > val x : RDD[Long, Int] = sc.textFile("file1.txt").map( line => > line.split(",")).map(row => (row(0).toLong, row(1).toInt) > [(4407 ,40), > (2064, 38), > (7815 ,10), > (5736,17), > (8031,3)] > > Second RDD from : file2.txt contains (ID, name) > val y: RDD[(Long, String)] {where ID is common in both the RDDs} > [(4407 ,Jhon), > (2064, Maria), > (7815 ,Casto), > (5736,Ram), > (8031,XYZ)] > > and I'm expecting result should be like this : [(ID, Name, Count)] > [(4407 ,Jhon, 40), > (2064, Maria, 38), > (7815 ,Casto, 10), > (5736,Ram, 17), > (8031,XYZ, 3)] > > > Any help will really appreciate. Thanks > > > > >> On 21 November 2014 09:18, dsiegmann [via Apache Spark User List] <[hidden >> email]> wrote: >> You want to use RDD.union (or SparkContext.union for many RDDs). These don't >> join on a key. Union doesn't really do anything itself, so it is low >> overhead. Note that the combined RDD will have all the partitions of the >> original RDDs, so you may want to coalesce after the union. >> >> val x = sc.parallelize(Seq( (1, 3), (2, 4) )) >> val y = sc.parallelize(Seq( (3, 5), (4, 7) )) >> val z = x.union(y) >> >> z.collect >> res0: Array[(Int, Int)] = Array((1,3), (2,4), (3,5), (4,7)) >> >> >>> On Thu, Nov 20, 2014 at 3:06 PM, Blind Faith <[hidden email]> wrote: >>> Say I have two RDDs with the following values >>> >>> x = [(1, 3), (2, 4)] >>> and >>> >>> y = [(3, 5), (4, 7)] >>> and I want to have >>> >>> z = [(1, 3), (2, 4), (3, 5), (4, 7)] >>> How can I achieve this. I know you can use outerJoin followed by map to >>> achieve this, but is there a more direct way for this. >>> >> >> >> >> -- >> Daniel Siegmann, Software Developer >> Velos >> Accelerating Machine Learning >> >> 54 W 40th St, New York, NY 10018 >> E: [hidden email] W: www.velos.io >> >> >> If you reply to this email, your message will be added to the discussion >> below: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys-tp19417p19419.html >> To start a new topic under Apache Spark User List, email [hidden email] >> To unsubscribe from Apache Spark User List, click here. >> NAML > > > > -- > Regards, > Harihar Nahak > BigData Developer > Wynyard > [hidden email] | Extn: 8019 > --Harihar > > View this message in context: Re: How to join two RDDs with mutually > exclusive keys > Sent from the Apache Spark User List mailing list archive at Nabble.com.