The o.a.hama.graph.Aggregator interface has the following method public void aggregate(VERTEX vertex, M value);
Couple of things 1. Why send the value when it can be got from the vertex? 2. Why send the complete vertex? In case of semi clustering as described in the Google Pregel paper, each vertex maintains a list of semi clusters and the data associated with it. Since, all the vertices are sent to the master in each superstep this might be a bottleneck with huge graphs. 3. o.a.giraph.graph.Aggregator class has a better interface where only the values to be aggregated are sent over the wire. 4. Also, will there be a requirement to do the aggregation in only some super steps and not all. Let say, to calculate the number of vectors/edges in the input and the output graph. In this scenario, aggregation in the first and last super step should be good. Any thoughts? Should I open a JIRA for the same. Thanks, Praveen
