The o.a.hama.graph.Aggregator interface has the following method

  public void aggregate(VERTEX vertex, M value);

Couple of things

1. Why send the value when it can be got from the vertex?

2. Why send the complete vertex? In case of semi clustering as described in
the Google Pregel paper, each vertex maintains a list of semi clusters and
the data associated with it. Since, all the vertices are sent to the master
in each superstep this might be a bottleneck with huge graphs.

3. o.a.giraph.graph.Aggregator class has a better interface where only the
values to be aggregated are sent over the wire.

4. Also, will there be a requirement to do the aggregation in only some
super steps and not all. Let say, to calculate the number of vectors/edges
in the input and the output graph. In this scenario, aggregation in the
first and last super step should be good.

Any thoughts? Should I open a JIRA for the same.

Thanks,
Praveen

Reply via email to