Re: Aggregators in Hama

Edward J. Yoon Wed, 04 Jul 2012 06:28:38 -0700

In my eyes, org.apache.giraph.graph.Aggregator is more simple and
intuitive to use as a basic interface. Some extensions can be
implemented by sub-classing basic interface.


>> > >   public void aggregate(VERTEX vertex, M value);

The first argument is always needed?

On Wed, Jul 4, 2012 at 2:44 PM, Praveen Sripati
<[email protected]> wrote:
> Thomas,
>
> Thanks for the clarification.
>
> Praveen
>
>
> On Wed, Jul 4, 2012 at 10:54 AM, Thomas Jungblut
> <[email protected]>wrote:
>
>> Hi Praveen,
>>
>> you completely got it wrong how aggregators work and it would be great if
>> you can look into the source code before asking unnecessary questions.
>> Don't know why Edward is voting that up.
>>
>> 1. Why send the value when it can be got from the vertex?
>>
>>
>> There is nothing beeing sent, what is send is defined by the aggregator and
>> not by the method signature.
>> Example: SumAggregator, the only thing that get's ever sent is what is
>> returned by getValue().
>>
>> 2. Why send the complete vertex?
>>
>>
>> Do you really think we send the whole vertex? That is ridiculous.
>>
>> 3. o.a.giraph.graph.Aggregator class has a better interface where only the
>> > values to be aggregated are sent over the wire.
>>
>>
>> See point 1, no we have a better interface because you can observe other
>> vertex attributes like number of edges or previous aggregated values. Which
>> is possible with Giraph, because you use Aggregators for yourself in the
>> vertex code whereas Hama hides this usage (which is what a framework is
>> for).
>> If you're unhappy with that feel free to change that.
>>
>> 4. Also, will there be a requirement to do the aggregation in only some
>> > super steps and not all. Let say, to calculate the number of
>> vectors/edges
>> > in the input and the output graph. In this scenario, aggregation in the
>> > first and last super step should be good.
>>
>>
>> Yes that is fine, however a sum aggregator in each superstep is just a 4
>> byte message and minimal instruction overhead so I'm pretty sure that it is
>> no big problem running them in each superstep.
>> For everything else you can use Counters, there is a jira which makes them
>> more "realtime" and make set counter available in the next superstep to all
>> peers.
>>
>> 2012/7/4 Edward J. Yoon <[email protected]>
>>
>> > +1
>> >
>> > On Wed, Jul 4, 2012 at 12:46 PM, Praveen Sripati
>> > <[email protected]> wrote:
>> > > The o.a.hama.graph.Aggregator interface has the following method
>> > >
>> > >   public void aggregate(VERTEX vertex, M value);
>> > >
>> > > Couple of things
>> > >
>> > > 1. Why send the value when it can be got from the vertex?
>> > >
>> > > 2. Why send the complete vertex? In case of semi clustering as
>> described
>> > in
>> > > the Google Pregel paper, each vertex maintains a list of semi clusters
>> > and
>> > > the data associated with it. Since, all the vertices are sent to the
>> > master
>> > > in each superstep this might be a bottleneck with huge graphs.
>> > >
>> > > 3. o.a.giraph.graph.Aggregator class has a better interface where only
>> > the
>> > > values to be aggregated are sent over the wire.
>> > >
>> > > 4. Also, will there be a requirement to do the aggregation in only some
>> > > super steps and not all. Let say, to calculate the number of
>> > vectors/edges
>> > > in the input and the output graph. In this scenario, aggregation in the
>> > > first and last super step should be good.
>> > >
>> > > Any thoughts? Should I open a JIRA for the same.
>> > >
>> > > Thanks,
>> > > Praveen
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>> >
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Aggregators in Hama

Reply via email to