Gabor Gevay created FLINK-2548:
----------------------------------

             Summary: VertexCentricIteration should avoid doing a coGroup with 
the edges and the solution set
                 Key: FLINK-2548
                 URL: https://issues.apache.org/jira/browse/FLINK-2548
             Project: Flink
          Issue Type: Improvement
          Components: Gelly
    Affects Versions: 0.9, 0.10
            Reporter: Gabor Gevay
            Assignee: Gabor Gevay


Currently, the performance of vertex centric iteration is suboptimal in those 
iterations where the workset is small, because the complexity of one iteration 
contains the number of edges and vertices of the graph because of coGroups:

VertexCentricIteration.buildMessagingFunction does a coGroup between the edges 
and the workset, to get the neighbors to the messaging UDF. This is problematic 
from a performance point of view, because the coGroup UDF gets called on all 
the edge groups, including those that are not getting any messages.

An analogous problem is present in 
VertexCentricIteration.createResultSimpleVertex at the creation of the updates: 
a coGroup happens between the messages and the solution set, which has the 
number of vertices of the graph included in its complexity.

Both of these coGroups could be avoided by doing a join instead (with the same 
keys that the coGroup uses), and then a groupBy. The complexity of these 
operations would be dominated by the size of the workset, as opposed to the 
number of edges or vertices of the graph. The joins should have the edges and 
the solution set at the build side to achieve this complexity. (They will not 
be rebuilt at every iteration.)

I made some experiments with this, and the initial results seem promising. On 
some workloads, this achieves a 2 times speedup, because later iterations often 
have quite small worksets, and these get a huge speedup from this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to