[
https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453841#comment-13453841
]
Maja Kabiljo commented on GIRAPH-293:
-------------------------------------
Thank you for looking!
The deal with the patch was to make checkpointing work, and I separated
aggregator code as a bonus :-)
For the code duplication you mean the parts which read/write from ZooKeeper? I
didn't pay much attention to making those parts nice, since they are going away
soon. I wanted to minimize the change and make it as easy to review as
possible, so you can see that those parts are really just copied directly from
BspService classes. That's why I keep saying the patch is much smaller than it
looks like. The differences between worker and master code are for example that
one writes just aggregator names and values while the other also writes
aggregator classnames (the opposite for reading); worker just reads final
values while master reads values from all workers and aggregates them along the
way.
> Should aggregators be checkpointed?
> -----------------------------------
>
> Key: GIRAPH-293
> URL: https://issues.apache.org/jira/browse/GIRAPH-293
> Project: Giraph
> Issue Type: Bug
> Reporter: Alessandro Presta
> Assignee: Maja Kabiljo
> Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are
> kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which
> currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators
> should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or
> they are always reset at each superstep. None of these is happening, but the
> error cancels out with the fact that we are not actually resuming from a
> checkpoint, but re-running the job from scratch.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira