Re: Aggregators in Hama

Praveen Sripati Tue, 03 Jul 2012 22:45:04 -0700

Thomas,

Thanks for the clarification.


Praveen


On Wed, Jul 4, 2012 at 10:54 AM, Thomas Jungblut
<[email protected]>wrote:

> Hi Praveen,
>
> you completely got it wrong how aggregators work and it would be great if
> you can look into the source code before asking unnecessary questions.
> Don't know why Edward is voting that up.
>
> 1. Why send the value when it can be got from the vertex?
>
>
> There is nothing beeing sent, what is send is defined by the aggregator and
> not by the method signature.
> Example: SumAggregator, the only thing that get's ever sent is what is
> returned by getValue().
>
> 2. Why send the complete vertex?
>
>
> Do you really think we send the whole vertex? That is ridiculous.
>
> 3. o.a.giraph.graph.Aggregator class has a better interface where only the
> > values to be aggregated are sent over the wire.
>
>
> See point 1, no we have a better interface because you can observe other
> vertex attributes like number of edges or previous aggregated values. Which
> is possible with Giraph, because you use Aggregators for yourself in the
> vertex code whereas Hama hides this usage (which is what a framework is
> for).
> If you're unhappy with that feel free to change that.
>
> 4. Also, will there be a requirement to do the aggregation in only some
> > super steps and not all. Let say, to calculate the number of
> vectors/edges
> > in the input and the output graph. In this scenario, aggregation in the
> > first and last super step should be good.
>
>
> Yes that is fine, however a sum aggregator in each superstep is just a 4
> byte message and minimal instruction overhead so I'm pretty sure that it is
> no big problem running them in each superstep.
> For everything else you can use Counters, there is a jira which makes them
> more "realtime" and make set counter available in the next superstep to all
> peers.
>
> 2012/7/4 Edward J. Yoon <[email protected]>
>
> > +1
> >
> > On Wed, Jul 4, 2012 at 12:46 PM, Praveen Sripati
> > <[email protected]> wrote:
> > > The o.a.hama.graph.Aggregator interface has the following method
> > >
> > >   public void aggregate(VERTEX vertex, M value);
> > >
> > > Couple of things
> > >
> > > 1. Why send the value when it can be got from the vertex?
> > >
> > > 2. Why send the complete vertex? In case of semi clustering as
> described
> > in
> > > the Google Pregel paper, each vertex maintains a list of semi clusters
> > and
> > > the data associated with it. Since, all the vertices are sent to the
> > master
> > > in each superstep this might be a bottleneck with huge graphs.
> > >
> > > 3. o.a.giraph.graph.Aggregator class has a better interface where only
> > the
> > > values to be aggregated are sent over the wire.
> > >
> > > 4. Also, will there be a requirement to do the aggregation in only some
> > > super steps and not all. Let say, to calculate the number of
> > vectors/edges
> > > in the input and the output graph. In this scenario, aggregation in the
> > > first and last super step should be good.
> > >
> > > Any thoughts? Should I open a JIRA for the same.
> > >
> > > Thanks,
> > > Praveen
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>

Re: Aggregators in Hama

Reply via email to