Say I have two RDDs with the following values
x = [(1, 3), (2, 4)]
and
y = [(3, 5), (4, 7)]
and I want to have
z = [(1, 3), (2, 4), (3, 5), (4, 7)]
How can I achieve this. I know you can use outerJoin followed by map to
achieve this, but is there a more direct way for this.
You want to use RDD.union (or SparkContext.union for many RDDs). These
don't join on a key. Union doesn't really do anything itself, so it is low
overhead. Note that the combined RDD will have all the partitions of the
original RDDs, so you may want to coalesce after the union.
val x =
email] http://user/SendEmail.jtp?type=nodenode=19419i=1 W:
www.velos.io
--
If you reply to this email, your message will be added to the discussion
below:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys
:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys-tp19417p19419.html
To start a new topic under Apache Spark User List, email [hidden email]
To unsubscribe from Apache Spark User List, click here.
NAML
--
Regards,
Harihar Nahak
BigData
.n3.nabble.com/How-to-join-two-RDDs-with-mutually-exclusive-keys-tp19417p19419.html
To start a new topic under Apache Spark User List, email [hidden email]
http://user/SendEmail.jtp?type=nodenode=19423i=1
To unsubscribe from Apache Spark User List, click here.
NAML
http://apache-spark-user-list
-RDDs-with-mutually-exclusive-keys-tp19417p19431.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h