[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Maja Kabiljo (JIRA) Wed, 12 Sep 2012 01:54:19 -0700

    [ 
https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453841#comment-13453841
 ]


Maja Kabiljo commented on GIRAPH-293:
-------------------------------------

Thank you for looking!

The deal with the patch was to make checkpointing work, and I separated 
aggregator code as a bonus :-)

For the code duplication you mean the parts which read/write from ZooKeeper? I 
didn't pay much attention to making those parts nice, since they are going away 
soon. I wanted to minimize the change and make it as easy to review as 
possible, so you can see that those parts are really just copied directly from 
BspService classes. That's why I keep saying the patch is much smaller than it 
looks like. The differences between worker and master code are for example that 
one writes just aggregator names and values while the other also writes 
aggregator classnames (the opposite for reading); worker just reads final 
values while master reads values from all workers and aggregates them along the 
way. 
                
> Should aggregators be checkpointed?
> -----------------------------------
>
>                 Key: GIRAPH-293
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-293
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-293.patch, GIRAPH-293.patch, GIRAPH-293.patch
>
>
> As I understand, we don't include aggregators in checkpoints because they are 
> kept in the Zookeeper.
> One of our bootcampers is working on fixing TestManualCheckpoint, which 
> currently involves starting a new job from a checkpoint from a previous job*.
> If this is a functionality we want going forward, then persistent aggregators 
> should be checkpointed.
> [*] That test relies on the fact that either aggregators are checkpointed or 
> they are always reset at each superstep. None of these is happening, but the 
> error cancels out with the fact that we are not actually resuming from a 
> checkpoint, but re-running the job from scratch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

Reply via email to