Here are some out-of-the-box ideas: If the elements lie in a fairly small
range and/or you're willing to work with limited precision, you could use
counting sort. Moreover, you could iteratively find the median using
bisection, which would be associative and commutative. It's easy to think
of
Whoops, I should have mentioned that it's a multivariate median (cf
http://www.pnas.org/content/97/4/1423.full.pdf ). It's easy to compute
when all the values are accessible at once. I'm not sure it's possible
with a combiner. So, I guess the question should be: Can I use
GraphX's Pregel without a
If you need access to all message values in vprog, there's nothing wrong
with building up an array in mergeMsg (option #1). This is what
org.apache.spark.graphx.lib.TriangleCount does, though with sets instead of
arrays. There will be a performance penalty because of the communication,
but it