A common reason for the "Joining ... is slow" message is that you're joining VertexRDDs without having cached them first. This will cause Spark to recompute unnecessarily, and as a side effect, the same index will get created twice and GraphX won't be able to do an efficient zip join.
For example, the following code will counterintuitively produce the "Joining ... is slow" message: val a = VertexRDD(sc.parallelize((1 to 100).map(x => (x.toLong, x)))) a.leftJoin(a) { (id, a, b) => a + b } The remedy is to call a.cache() before a.leftJoin(a). Ankur <http://www.ankurdave.com/>