I think you're right that the javadoc isn't specific enough.

   * Use a registered aggregator in current superstep.
   * Even when the same aggregator should be used in the next
   * superstep, useAggregator needs to be called at the beginning
   * of that superstep in preSuperstep().
   * @param name Name of aggregator
   * @return boolean (false when not registered)
  boolean useAggregator(String name);

This should be augmented to say that none of the Aggregator methods should be called until this method is invoke. Feel free to file a JIRA and fix. Thanks!

If you would like to, please feel free to add Aggregator documentation to https://cwiki.apache.org/confluence/display/GIRAPH/Index


On 5/2/12 12:15 PM, Benjamin Heitmann wrote:

I had to use aggregators for various statistic reporting tasks,
and I noticed that the aggregator operations need to be used in a very specific 
especially when the aggregator is getting a reset between supersteps.

I found that the sequence described in RandomMessageBenchmark (in the 
org.apache.giraph.benchmark package)
results in consistent counts for one aggregator across all workers.
The most important thing, seems to be to call the reset method 
setAggregatedValue() in preSuperstep() of the WorkerContext class,
before calling this.useAggregator().

If I called the reset method in postSuperstep(), then every worker reported a 
different value for the aggregator.

However, the aggregator which gets the reset between supersteps, still is wrong.

I know this, because a second aggregator counts the same thing, and reports it 
after each superstep,
without getting a reset.

Is this a known issue ? Should I file a bug report on it ?

In addition, it would be great to document correct usage of the aggregators 
Even just in the javadoc of the aggregator interface might be enough.

Should I try to add some documentation to the aggregator interface?
Then the committers can correct me if that documentation is wrong, I guess.

Reply via email to