Hi,
This is my first post to the email list so give me some feedback if I do
something wrong.
To do operations on two RDD's to produce a new one you can just use
zipPartitions, but if I have an arbitrary number of RDD's that I would like
to perform an operation on to produce a single RDD, how do
There are numerous ways to combine RDDs. In your case, it seems you have
several RDDs of the same type and you want to do an operation across all of
them as if they were a single RDD. The way to do this is SparkContext.union
or RDD.union, which have minimal overhead. The only difference between