In my eyes, org.apache.giraph.graph.Aggregator is more simple and intuitive to use as a basic interface. Some extensions can be implemented by sub-classing basic interface.
>> > > public void aggregate(VERTEX vertex, M value); The first argument is always needed? On Wed, Jul 4, 2012 at 2:44 PM, Praveen Sripati <[email protected]> wrote: > Thomas, > > Thanks for the clarification. > > Praveen > > > On Wed, Jul 4, 2012 at 10:54 AM, Thomas Jungblut > <[email protected]>wrote: > >> Hi Praveen, >> >> you completely got it wrong how aggregators work and it would be great if >> you can look into the source code before asking unnecessary questions. >> Don't know why Edward is voting that up. >> >> 1. Why send the value when it can be got from the vertex? >> >> >> There is nothing beeing sent, what is send is defined by the aggregator and >> not by the method signature. >> Example: SumAggregator, the only thing that get's ever sent is what is >> returned by getValue(). >> >> 2. Why send the complete vertex? >> >> >> Do you really think we send the whole vertex? That is ridiculous. >> >> 3. o.a.giraph.graph.Aggregator class has a better interface where only the >> > values to be aggregated are sent over the wire. >> >> >> See point 1, no we have a better interface because you can observe other >> vertex attributes like number of edges or previous aggregated values. Which >> is possible with Giraph, because you use Aggregators for yourself in the >> vertex code whereas Hama hides this usage (which is what a framework is >> for). >> If you're unhappy with that feel free to change that. >> >> 4. Also, will there be a requirement to do the aggregation in only some >> > super steps and not all. Let say, to calculate the number of >> vectors/edges >> > in the input and the output graph. In this scenario, aggregation in the >> > first and last super step should be good. >> >> >> Yes that is fine, however a sum aggregator in each superstep is just a 4 >> byte message and minimal instruction overhead so I'm pretty sure that it is >> no big problem running them in each superstep. >> For everything else you can use Counters, there is a jira which makes them >> more "realtime" and make set counter available in the next superstep to all >> peers. >> >> 2012/7/4 Edward J. Yoon <[email protected]> >> >> > +1 >> > >> > On Wed, Jul 4, 2012 at 12:46 PM, Praveen Sripati >> > <[email protected]> wrote: >> > > The o.a.hama.graph.Aggregator interface has the following method >> > > >> > > public void aggregate(VERTEX vertex, M value); >> > > >> > > Couple of things >> > > >> > > 1. Why send the value when it can be got from the vertex? >> > > >> > > 2. Why send the complete vertex? In case of semi clustering as >> described >> > in >> > > the Google Pregel paper, each vertex maintains a list of semi clusters >> > and >> > > the data associated with it. Since, all the vertices are sent to the >> > master >> > > in each superstep this might be a bottleneck with huge graphs. >> > > >> > > 3. o.a.giraph.graph.Aggregator class has a better interface where only >> > the >> > > values to be aggregated are sent over the wire. >> > > >> > > 4. Also, will there be a requirement to do the aggregation in only some >> > > super steps and not all. Let say, to calculate the number of >> > vectors/edges >> > > in the input and the output graph. In this scenario, aggregation in the >> > > first and last super step should be good. >> > > >> > > Any thoughts? Should I open a JIRA for the same. >> > > >> > > Thanks, >> > > Praveen >> > >> > >> > >> > -- >> > Best Regards, Edward J. Yoon >> > @eddieyoon >> > >> -- Best Regards, Edward J. Yoon @eddieyoon
