how to split data-sets efficiently?

Martin Neumann Sun, 27 Jul 2014 03:58:23 -0700

Hej,

I have a dataset of StringID's and I want to map them to Longs by using a
hash function. I will use the LongID's in a series of Iterative
computations and then map back to StringID's.
Currently I have a map operation that creates tuples with the string and
the long. I have an other mapper cleaning out the String's.


Is there a way to do a operation that allows for more the one output set
(basically split a set into 2 sets)? This would reduce the complexity of
the code a lot.
Also how does the optimizer deal with this case? Does it join both map
operation's together and actually run it as if it would be a split?

cheers Martin

how to split data-sets efficiently?

Reply via email to