Hi Ankur, Thank you! This looks like a nice simplification. There should be some performance improvement since newVerts are not chached now. I’ve added your patch: https://issues.apache.org/jira/browse/SPARK-9436
Best regards, Alexander From: Ankur Dave [mailto:ankurd...@gmail.com] Sent: Tuesday, July 28, 2015 12:05 PM To: Ulanov, Alexander Cc: Robin East; dev@spark.apache.org Subject: Re: Two joins in GraphX Pregel implementation On 27 Jul 2015, at 16:42, Ulanov, Alexander <alexander.ula...@hp.com<mailto:alexander.ula...@hp.com>> wrote: It seems that the mentioned two joins can be rewritten as one outer join You're right. In fact, the outer join can be streamlined further using a method from GraphOps: g = g.joinVertices(messages)(vprog).cache() Then, instead of passing newVerts as the active set for mapReduceTriplets, we could pass `messages`. If you're interested in proposing a PR for this, I've attached a patch with these changes and updates to the comments. On Tue, Jul 28, 2015 at 1:15 AM, Ulanov, Alexander <alexander.ula...@hp.com<mailto:alexander.ula...@hp.com>> wrote: I’ve found two PRs (almost identical) for replacing mapReduceTriplets with aggregateMessages [...] Do you know the reason why this improvement is not pushed? There isn't any performance benefit to switching Pregel to use aggregateMessages while preserving its current interface, because the interface uses Iterators and would require us to wrap and unwrap them anyway. The semantics of aggregateMessagesWithActiveSet are otherwise the same as mapReduceTriplets, so there isn't any functionality we are missing out on. And this change seems too small to justify introducing a new version of Pregel, though it would be worthwhile when combined with other improvements<https://github.com/apache/spark/pull/1217>. Ankur<http://www.ankurdave.com/>