+1 On Thu, Feb 2, 2012 at 8:39 PM, Thomas Jungblut <[email protected]> wrote: > Hey, > > I had a bit of time to go through the jira issues and sort out several > things related to Fault Tolerance. > > Here are my results: > > Fault Tolerance in Hama (all jiras related): > > [HAMA-199] Add fault tolerance to BSPPeer < CLOSE, too generic > [HAMA-445] Make configurable checkpointing > [HAMA-440] Features required in recovery procedure. > [HAMA-498] BSPTask should periodically ping its parent. > > Then I have splitted this in two main parts, "Detect Failure" and "Solve > Failure": > > Detect Failure: > [HAMA-370] Failure detector for Hama < Nearly complete? > [HAMA-498] BSPTask should periodically ping its parent. > > Solve Failure: > [HAMA-445] Make configurable checkpointing >> TODO: >> Groom needs functionality to restart a task >> BSPMaster needs functionality to restart a groom > > Also here is MISC, which is not strongly related. > > MISC: > [HAMA-445] Make configurable checkpointing > [HAMA-440] Features required in recovery procedure. >> TODO mainly discussion: >> New BSP "interface", with a chaining of supersteps to make restarting > tasks more simpler (contained in 440) > > > Let's make an umbrella jira for this larger task and close 199, since this > is way too generic and too old. > We should also split 440, because it combines too much unrelated things > together. > > Also "Lin" has assigned the majority of them. What is your progress? And do > you mind splitting these? > > [LINKS] > https://issues.apache.org/jira/browse/HAMA-440 > https://issues.apache.org/jira/browse/HAMA-119 > https://issues.apache.org/jira/browse/HAMA-445 > https://issues.apache.org/jira/browse/HAMA-440 > https://issues.apache.org/jira/browse/HAMA-370 > https://issues.apache.org/jira/browse/HAMA-498 > > -- > Thomas Jungblut > Berlin <[email protected]>
-- Best Regards, Edward J. Yoon @eddieyoon
