Hi, This is my first post to the email list so give me some feedback if I do something wrong.
To do operations on two RDD's to produce a new one you can just use zipPartitions, but if I have an arbitrary number of RDD's that I would like to perform an operation on to produce a single RDD, how do I do that? I've been reading the docs but haven't found anything. For example: if I have a Seq of RDD[Array[Int]]'s and I want to take the majority of each array cell. So if all RDD's have one array which are like this: [1, 2, 3] [0, 0, 0] [1, 2, 0] Then the resulting RDD would have the array [1, 2, 0]. How do I approach this problem? It becomes too heavy to have an accumulator variable I guess? Otherwise it could be an array of maps with values as keys and frequency as values. Essentially I want something like zipPartitions but for arbitrarily many RDD's, is there any such functionality or how would I approach this problem? Cheers, Johan