Re: How to join two RDDs with mutually exclusive keys

Harihar Nahak Thu, 20 Nov 2014 13:55:09 -0800

I've similar type of issue, want to join two different type of RDD in one
RDD


file1.txt content (ID, counts)
val x : RDD[Long, Int] = sc.textFile("file1.txt").map( line =>
line.split(",")).map(row => (row(0).toLong, row(1).toInt)
[(4407 ,40),
(2064, 38),
(7815 ,10),
(5736,17),
(8031,3)]

Second RDD from : file2.txt contains (ID, name)
val y: RDD[(Long, String)]    {where ID is common in both the RDDs}
[(4407 ,Jhon),
(2064, Maria),
(7815 ,Casto),
(5736,Ram),
(8031,XYZ)]

and I'm expecting result should be like this : [(ID, Name, Count)]
[(4407 ,Jhon, 40),
(2064, Maria, 38),
(7815 ,Casto, 10),
(5736,Ram, 17),
(8031,XYZ, 3)]


Any help will really appreciate. Thanks




On 21 November 2014 09:18, dsiegmann [via Apache Spark User List] <
ml-node+s1001560n19419...@n3.nabble.com> wrote:

> You want to use RDD.union (or SparkContext.union for many RDDs). These
> don't join on a key. Union doesn't really do anything itself, so it is low
> overhead. Note that the combined RDD will have all the partitions of the
> original RDDs, so you may want to coalesce after the union.
>
> val x = sc.parallelize(Seq( (1, 3), (2, 4) ))
> val y = sc.parallelize(Seq( (3, 5), (4, 7) ))
> val z = x.union(y)
>
> z.collect
> res0: Array[(Int, Int)] = Array((1,3), (2,4), (3,5), (4,7))
>
>
> On Thu, Nov 20, 2014 at 3:06 PM, Blind Faith <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19419&i=0>> wrote:
>
>> Say I have two RDDs with the following values
>>
>> x = [(1, 3), (2, 4)]
>>
>> and
>>
>> y = [(3, 5), (4, 7)]
>>
>> and I want to have
>>
>> z = [(1, 3), (2, 4), (3, 5), (4, 7)]
>>
>> How can I achieve this. I know you can use outerJoin followed by map to
>> achieve this, but is there a more direct way for this.
>>
>
>
>
> --
> Daniel Siegmann, Software Developer
> Velos
> Accelerating Machine Learning
>
> 54 W 40th St, New York, NY 10018
> E: [hidden email] <http://user/SendEmail.jtp?type=node&node=19419&i=1> W:
> www.velos.io
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys-tp19417p19419.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-----
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys-tp19417p19423.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to join two RDDs with mutually exclusive keys

Reply via email to