[
https://issues.apache.org/jira/browse/HAMA-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207059#comment-13207059
]
Suraj Menon edited comment on HAMA-511 at 2/13/12 6:58 PM:
-----------------------------------------------------------
Thanks ChiaHung for the information. It looks like a good read and I agree with
your opinion of user specifically selecting this as an option.
Regarding your comment on checkpointing, it would be a bad choice if someone
sets the checkpointing interval to 5 when the "largest superstep unit"(If I am
allowed to term so) of their computation is 7. Even if it is set at 5, Failure
at superstep 6 would necessitate the large task (task requiring 7 supersteps)
to start over again, but not the smaller ones checkpointed at 5th superstep. In
today's design, we would have the first superstep of other small tasks waiting
till the large task finishes. and on failure at 6th superstep, would require
restart of all the tasks from superstep 0. Don't miss the point that the larger
task was not receiving messages from other tasks during this period. Please
note that the superstep count required for a job would be configurable in such
a scenario and when a task goes into sync it is informing ZK which superstep is
the task seeking a sync for. The getSuperStepCount for large task would return
its start superstep count + 7. Your situation also reiterates my aforesaid
point that instead of coding the checkpoint function -
private final boolean shouldCheckPointNow(){
return (conf.getBoolean(Constants.CHECKPOINT_ENABLED, false) &&
(checkPointInterval != 0) &&
(getSuperstepCount() % checkPointInterval) == 0);
}
We should have -
private final boolean shouldCheckPointNow(){
// previousCheckpointSuperstep is the superstep at which a checkpoint was
done
return (conf.getBoolean(Constants.CHECKPOINT_ENABLED, false) &&
(checkPointInterval != 0) &&
(getSuperstepCount() - previousCheckpointSuperstep) >= 0);
}
The change was necessary here because we have selective superstep design in
mind.
was (Author: surajsmenon):
Thanks ChiaHung for the information. It looks like a good read and I agree
with your opinion of user specifically selecting this as an option.
Regarding your comment on checkpointing, it would be a bad choice if someone
sets the checkpointing interval to 5 when the "largest superstep unit"(If I am
allowed to term so) of their computation is 7. Even if it is set at 5, Failure
at superstep 6 would necessitate the large task (task requiring 7 supersteps)
to start over again, but not the smaller ones checkpointed at 5th superstep. In
today's design, we would have the first superstep of other small tasks waiting
till the large task finishes. and on failure at 6th superstep, would require
restart of all the tasks from superstep 0. Don't miss the point that the larger
task was not receiving messages from other tasks during this period. Please
note that the superstep count required for a job would be configurable in such
a scenario and when a task goes into sync it is informing ZK which superstep is
the task seeking a sync for. Your situation also reiterates my aforesaid point
that instead of coding the checkpoint function -
private final boolean shouldCheckPointNow(){
return (conf.getBoolean(Constants.CHECKPOINT_ENABLED, false) &&
(checkPointInterval != 0) &&
(getSuperstepCount() % checkPointInterval) == 0);
}
We should have -
private final boolean shouldCheckPointNow(){
// previousCheckpointSuperstep is the superstep at which a checkpoint was
done
return (conf.getBoolean(Constants.CHECKPOINT_ENABLED, false) &&
(checkPointInterval != 0) &&
(getSuperstepCount() - previousCheckpointSuperstep) >= 0);
}
The change was necessary here because we have selective superstep design in
mind.
> Submitting heterogenous supersteps with precedence constraints on Hama
> ----------------------------------------------------------------------
>
> Key: HAMA-511
> URL: https://issues.apache.org/jira/browse/HAMA-511
> Project: Hama
> Issue Type: New Feature
> Reporter: Suraj Menon
> Priority: Minor
> Attachments: Defining supersteps for BSP.pdf
>
>
> Hama should support submission of jobs with support for:
> 1) Skipping unwanted superstep synchronization.
> 2) Run supersteps with heterogenous nature of computation
> 3) Scheduling supersteps with precedence constraints.
> An explanation of these is provided in the attachment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira