[
https://issues.apache.org/jira/browse/HAMA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062843#comment-13062843
]
ChiaHung Lin commented on HAMA-411:
-----------------------------------
With BSP model, we can have checkpoints when computation reaches the barrier
synchronization, which forms a consistent global state. So in the case where a
user configures to have checkpoint with every 3 superstep, once a task failure
the computation can roll back to a global state a few supersteps ago.
The drawback of having such global checkpoint would be if involved processes in
computation increase, rolling back to a consistent global state is an overhead.
> Support checkpoint based on HDFS
> --------------------------------
>
> Key: HAMA-411
> URL: https://issues.apache.org/jira/browse/HAMA-411
> Project: Hama
> Issue Type: New Feature
> Components: bsp
> Reporter: Thomas Jungblut
>
> We need to add checkpointing to Hama to deal with fault in future.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira