Hi,

This is my first post to the email list so give me some feedback if I do
something wrong.

To do operations on two RDD's to produce a new one you can just use
zipPartitions, but if I have an arbitrary number of RDD's that I would like
to perform an operation on to produce a single RDD, how do I do that? I've
been reading the docs but haven't found anything.

For example: if I have a Seq of RDD[Array[Int]]'s and I want to take the
majority of each array cell. So if all RDD's have one array which are like
this:

[1, 2, 3]
[0, 0, 0]
[1, 2, 0]

Then the resulting RDD would have the array [1, 2, 0]. How do I approach
this problem? It becomes too heavy to have an accumulator variable I guess?
Otherwise it could be an array of maps with values as keys and frequency as
values.

Essentially I want something like zipPartitions but for arbitrarily many
RDD's, is there any such functionality or how would I approach this problem?

Cheers,

Johan

Reply via email to