How to do operations on multiple RDD's

2014-09-26 Thread Johan Stenberg
Hi, This is my first post to the email list so give me some feedback if I do something wrong. To do operations on two RDD's to produce a new one you can just use zipPartitions, but if I have an arbitrary number of RDD's that I would like to perform an operation on to produce a single RDD, how do

Re: How to do operations on multiple RDD's

2014-09-26 Thread Daniel Siegmann
There are numerous ways to combine RDDs. In your case, it seems you have several RDDs of the same type and you want to do an operation across all of them as if they were a single RDD. The way to do this is SparkContext.union or RDD.union, which have minimal overhead. The only difference between