+1

On Thu, Feb 2, 2012 at 8:39 PM, Thomas Jungblut
<[email protected]> wrote:
> Hey,
>
> I had a bit of time to go through the jira issues and sort out several
> things related to Fault Tolerance.
>
> Here are my results:
>
> Fault Tolerance in Hama (all jiras related):
>
> [HAMA-199] Add fault tolerance to BSPPeer < CLOSE, too generic
> [HAMA-445] Make configurable checkpointing
> [HAMA-440] Features required in recovery procedure.
> [HAMA-498] BSPTask should periodically ping its parent.
>
> Then I have splitted this in two main parts, "Detect Failure" and "Solve
> Failure":
>
> Detect Failure:
> [HAMA-370] Failure detector for Hama < Nearly complete?
> [HAMA-498] BSPTask should periodically ping its parent.
>
> Solve Failure:
> [HAMA-445] Make configurable checkpointing
>> TODO:
>> Groom needs functionality to restart a task
>> BSPMaster needs functionality to restart a groom
>
> Also here is MISC, which is not strongly related.
>
> MISC:
> [HAMA-445] Make configurable checkpointing
> [HAMA-440] Features required in recovery procedure.
>> TODO mainly discussion:
>> New BSP "interface", with a chaining of supersteps to make restarting
> tasks more simpler (contained in 440)
>
>
> Let's make an umbrella jira for this larger task and close 199, since this
> is way too generic and too old.
> We should also split 440, because it combines too much unrelated things
> together.
>
> Also "Lin" has assigned the majority of them. What is your progress? And do
> you mind splitting these?
>
> [LINKS]
> https://issues.apache.org/jira/browse/HAMA-440
> https://issues.apache.org/jira/browse/HAMA-119
> https://issues.apache.org/jira/browse/HAMA-445
> https://issues.apache.org/jira/browse/HAMA-440
> https://issues.apache.org/jira/browse/HAMA-370
> https://issues.apache.org/jira/browse/HAMA-498
>
> --
> Thomas Jungblut
> Berlin <[email protected]>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Reply via email to