We also can separate the issue into two parts: 1) cluster high availability and 2) fault tolerant job processing. Only HAMA-370 is related with 1).
On Fri, Feb 3, 2012 at 10:23 AM, Edward J. Yoon <[email protected]> wrote: > +1 > > On Thu, Feb 2, 2012 at 8:39 PM, Thomas Jungblut > <[email protected]> wrote: >> Hey, >> >> I had a bit of time to go through the jira issues and sort out several >> things related to Fault Tolerance. >> >> Here are my results: >> >> Fault Tolerance in Hama (all jiras related): >> >> [HAMA-199] Add fault tolerance to BSPPeer < CLOSE, too generic >> [HAMA-445] Make configurable checkpointing >> [HAMA-440] Features required in recovery procedure. >> [HAMA-498] BSPTask should periodically ping its parent. >> >> Then I have splitted this in two main parts, "Detect Failure" and "Solve >> Failure": >> >> Detect Failure: >> [HAMA-370] Failure detector for Hama < Nearly complete? >> [HAMA-498] BSPTask should periodically ping its parent. >> >> Solve Failure: >> [HAMA-445] Make configurable checkpointing >>> TODO: >>> Groom needs functionality to restart a task >>> BSPMaster needs functionality to restart a groom >> >> Also here is MISC, which is not strongly related. >> >> MISC: >> [HAMA-445] Make configurable checkpointing >> [HAMA-440] Features required in recovery procedure. >>> TODO mainly discussion: >>> New BSP "interface", with a chaining of supersteps to make restarting >> tasks more simpler (contained in 440) >> >> >> Let's make an umbrella jira for this larger task and close 199, since this >> is way too generic and too old. >> We should also split 440, because it combines too much unrelated things >> together. >> >> Also "Lin" has assigned the majority of them. What is your progress? And do >> you mind splitting these? >> >> [LINKS] >> https://issues.apache.org/jira/browse/HAMA-440 >> https://issues.apache.org/jira/browse/HAMA-119 >> https://issues.apache.org/jira/browse/HAMA-445 >> https://issues.apache.org/jira/browse/HAMA-440 >> https://issues.apache.org/jira/browse/HAMA-370 >> https://issues.apache.org/jira/browse/HAMA-498 >> >> -- >> Thomas Jungblut >> Berlin <[email protected]> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon -- Best Regards, Edward J. Yoon @eddieyoon
