Re: graphx Joining two VertexPartitions with different indexes is slow.

Ankur Dave Thu, 03 Jul 2014 19:47:26 -0700

A common reason for the "Joining ... is slow" message is that you're
joining VertexRDDs without having cached them first. This will cause Spark
to recompute unnecessarily, and as a side effect, the same index will get
created twice and GraphX won't be able to do an efficient zip join.


For example, the following code will counterintuitively produce the
"Joining ... is slow" message:

val a = VertexRDD(sc.parallelize((1 to 100).map(x => (x.toLong, x))))
a.leftJoin(a) { (id, a, b) => a + b }

The remedy is to call a.cache() before a.leftJoin(a).

Ankur <http://www.ankurdave.com/>

Re: graphx Joining two VertexPartitions with different indexes is slow.

Reply via email to