Thomas, Thanks for the clarification.
Praveen On Wed, Jul 4, 2012 at 10:54 AM, Thomas Jungblut <[email protected]>wrote: > Hi Praveen, > > you completely got it wrong how aggregators work and it would be great if > you can look into the source code before asking unnecessary questions. > Don't know why Edward is voting that up. > > 1. Why send the value when it can be got from the vertex? > > > There is nothing beeing sent, what is send is defined by the aggregator and > not by the method signature. > Example: SumAggregator, the only thing that get's ever sent is what is > returned by getValue(). > > 2. Why send the complete vertex? > > > Do you really think we send the whole vertex? That is ridiculous. > > 3. o.a.giraph.graph.Aggregator class has a better interface where only the > > values to be aggregated are sent over the wire. > > > See point 1, no we have a better interface because you can observe other > vertex attributes like number of edges or previous aggregated values. Which > is possible with Giraph, because you use Aggregators for yourself in the > vertex code whereas Hama hides this usage (which is what a framework is > for). > If you're unhappy with that feel free to change that. > > 4. Also, will there be a requirement to do the aggregation in only some > > super steps and not all. Let say, to calculate the number of > vectors/edges > > in the input and the output graph. In this scenario, aggregation in the > > first and last super step should be good. > > > Yes that is fine, however a sum aggregator in each superstep is just a 4 > byte message and minimal instruction overhead so I'm pretty sure that it is > no big problem running them in each superstep. > For everything else you can use Counters, there is a jira which makes them > more "realtime" and make set counter available in the next superstep to all > peers. > > 2012/7/4 Edward J. Yoon <[email protected]> > > > +1 > > > > On Wed, Jul 4, 2012 at 12:46 PM, Praveen Sripati > > <[email protected]> wrote: > > > The o.a.hama.graph.Aggregator interface has the following method > > > > > > public void aggregate(VERTEX vertex, M value); > > > > > > Couple of things > > > > > > 1. Why send the value when it can be got from the vertex? > > > > > > 2. Why send the complete vertex? In case of semi clustering as > described > > in > > > the Google Pregel paper, each vertex maintains a list of semi clusters > > and > > > the data associated with it. Since, all the vertices are sent to the > > master > > > in each superstep this might be a bottleneck with huge graphs. > > > > > > 3. o.a.giraph.graph.Aggregator class has a better interface where only > > the > > > values to be aggregated are sent over the wire. > > > > > > 4. Also, will there be a requirement to do the aggregation in only some > > > super steps and not all. Let say, to calculate the number of > > vectors/edges > > > in the input and the output graph. In this scenario, aggregation in the > > > first and last super step should be good. > > > > > > Any thoughts? Should I open a JIRA for the same. > > > > > > Thanks, > > > Praveen > > > > > > > > -- > > Best Regards, Edward J. Yoon > > @eddieyoon > > >
