Hi Andrew,
Thanks for your answer.
The reason of the question: I've been trying to contribute to the community
by helping answering Spark-related questions on Stack Overflow.
(note on that: Given the growing volume on the user list lately, I think it
will need to scale out to other venues, so
Hmm that sounds like it could be done in a custom OutputFormat, but I'm not
familiar enough with custom OutputFormats to say that's the right thing to
do.
On Tue, Jun 3, 2014 at 10:23 AM, Gerard Maas gerard.m...@gmail.com wrote:
Hi Andrew,
Thanks for your answer.
The reason of the
The RDD API has functions to join multiple RDDs, such as PariRDD.join
or PariRDD.cogroup that take another RDD as input. e.g.
firstRDD.join(secondRDD)
I'm looking for ways to do the opposite: split an existing RDD. What is the
right way to create derivate RDDs from an existing RDD?
e.g.