Hi Ankur,

Thank you! This looks like a nice simplification. There should be some 
performance improvement since newVerts are not chached now.
I’ve added your patch:
https://issues.apache.org/jira/browse/SPARK-9436

Best regards, Alexander

From: Ankur Dave [mailto:ankurd...@gmail.com]
Sent: Tuesday, July 28, 2015 12:05 PM
To: Ulanov, Alexander
Cc: Robin East; dev@spark.apache.org
Subject: Re: Two joins in GraphX Pregel implementation

On 27 Jul 2015, at 16:42, Ulanov, Alexander 
<alexander.ula...@hp.com<mailto:alexander.ula...@hp.com>> wrote:
It seems that the mentioned two joins can be rewritten as one outer join

You're right. In fact, the outer join can be streamlined further using a method 
from GraphOps:

g = g.joinVertices(messages)(vprog).cache()

Then, instead of passing newVerts as the active set for mapReduceTriplets, we 
could pass `messages`.

If you're interested in proposing a PR for this, I've attached a patch with 
these changes and updates to the comments.

On Tue, Jul 28, 2015 at 1:15 AM, Ulanov, Alexander 
<alexander.ula...@hp.com<mailto:alexander.ula...@hp.com>> wrote:
I’ve found two PRs (almost identical) for replacing mapReduceTriplets with 
aggregateMessages
[...]
Do you know the reason why this improvement is not pushed?

There isn't any performance benefit to switching Pregel to use 
aggregateMessages while preserving its current interface, because the interface 
uses Iterators and would require us to wrap and unwrap them anyway. The 
semantics of aggregateMessagesWithActiveSet are otherwise the same as 
mapReduceTriplets, so there isn't any functionality we are missing out on. And 
this change seems too small to justify introducing a new version of Pregel, 
though it would be worthwhile when combined with other 
improvements<https://github.com/apache/spark/pull/1217>.

Ankur<http://www.ankurdave.com/>

Reply via email to