[ 
https://issues.apache.org/jira/browse/HAMA-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202510#comment-13202510
 ] 

Thomas Jungblut commented on HAMA-503:
--------------------------------------

Hey Lin, 

I have made a bit of an "interface".

For a superstep:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/ft/Superstep.java

For the BSP that can handle faults:

https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/ft/FaultTolerantBSP.java

The idea behind it is, that you init a task with a kind of start superstep. 
This is the index of the array of user defined supersteps. 
When fault happens, we inject the index where the superstep failed to the new 
task, so at runtime it will start computation from the given point.

I have not really tried to make a real-world BSP example with it, so the 
Superstep class may not be a good interface.

What do you think?
                
> Chainable computations for tault tolerance
> ------------------------------------------
>
>                 Key: HAMA-503
>                 URL: https://issues.apache.org/jira/browse/HAMA-503
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp
>    Affects Versions: 0.4.0
>            Reporter: Thomas Jungblut
>             Fix For: 0.5.0
>
>
> refactor bsp() in allowing checkpointed messages to be recovered. 
> ChiaHung Lin had a fancy idea in chaining superstep class to make the whole 
> recovering more convenient and less error prone, or at least possible.
> A user does not define a BSP anymore, instead he defines a single superstep 
> inside of a computation class. A user is able to chain these in a specific 
> ordering. After each of this computation the framework calls sync() and 
> exchanges the messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to