Re: GraphX: Help understanding the limitations of Pregel

2014-04-23 Thread Tom Vacek
Here are some out-of-the-box ideas: If the elements lie in a fairly small range and/or you're willing to work with limited precision, you could use counting sort. Moreover, you could iteratively find the median using bisection, which would be associative and commutative. It's easy to think of

Re: GraphX: Help understanding the limitations of Pregel

2014-04-23 Thread Ryan Compton
Whoops, I should have mentioned that it's a multivariate median (cf http://www.pnas.org/content/97/4/1423.full.pdf ). It's easy to compute when all the values are accessible at once. I'm not sure it's possible with a combiner. So, I guess the question should be: Can I use GraphX's Pregel without a

Re: GraphX: Help understanding the limitations of Pregel

2014-04-23 Thread Ankur Dave
If you need access to all message values in vprog, there's nothing wrong with building up an array in mergeMsg (option #1). This is what org.apache.spark.graphx.lib.TriangleCount does, though with sets instead of arrays. There will be a performance penalty because of the communication, but it