[ 
https://issues.apache.org/jira/browse/FLINK-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704489#comment-14704489
 ] 

Stephan Ewen commented on FLINK-2548:
-------------------------------------

(1) CoGrouping with the solution set is a different runtime operator than the 
regular CoGroup. It is a different operator. Search for the 
{{CoGroupWithSolutionSet}} operator. It is a grouping plus join, effectively.

(2) To have an iterator for the "message" sending, you can group by source 
vertex. Then you group again by target vertex, to have an iterator for incoming 
messages (either reduce+join or CoGroup). So you have now join + reduce + 
reduce + join in teh iteration?

(3) Memory safe means there is never an OOM exception, no matter how large the 
out-degree of a vertex. It is achieved by not havind a list with the elements, 
but streaming the edges through the operator in a pipelined fashion.

> VertexCentricIteration should avoid doing a coGroup with the edges and the 
> solution set
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-2548
>                 URL: https://issues.apache.org/jira/browse/FLINK-2548
>             Project: Flink
>          Issue Type: Improvement
>          Components: Gelly
>    Affects Versions: 0.9, 0.10
>            Reporter: Gabor Gevay
>            Assignee: Gabor Gevay
>
> Currently, the performance of vertex centric iteration is suboptimal in those 
> iterations where the workset is small, because the complexity of one 
> iteration contains the number of edges and vertices of the graph because of 
> coGroups:
> VertexCentricIteration.buildMessagingFunction does a coGroup between the 
> edges and the workset, to get the neighbors to the messaging UDF. This is 
> problematic from a performance point of view, because the coGroup UDF gets 
> called on all the edge groups, including those that are not getting any 
> messages.
> An analogous problem is present in 
> VertexCentricIteration.createResultSimpleVertex at the creation of the 
> updates: a coGroup happens between the messages and the solution set, which 
> has the number of vertices of the graph included in its complexity.
> Both of these coGroups could be avoided by doing a join instead (with the 
> same keys that the coGroup uses), and then a groupBy. The complexity of these 
> operations would be dominated by the size of the workset, as opposed to the 
> number of edges or vertices of the graph. The joins should have the edges and 
> the solution set at the build side to achieve this complexity. (They will not 
> be rebuilt at every iteration.)
> I made some experiments with this, and the initial results seem promising. On 
> some workloads, this achieves a 2 times speedup, because later iterations 
> often have quite small worksets, and these get a huge speedup from this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to