+1 It's good if we have an umbrella jira so we can track it easier.

Failure detection (HAMA-370) was already done and tested on my
machines previously.

First point in HAMA-440 is not needed because it has been integrated
into bsp task.



On 3 February 2012 09:38, Edward J. Yoon <[email protected]> wrote:
> We also can separate the issue into two parts: 1) cluster high
> availability and 2) fault tolerant job processing. Only HAMA-370 is
> related with 1).
>
> On Fri, Feb 3, 2012 at 10:23 AM, Edward J. Yoon <[email protected]> wrote:
>> +1
>>
>> On Thu, Feb 2, 2012 at 8:39 PM, Thomas Jungblut
>> <[email protected]> wrote:
>>> Hey,
>>>
>>> I had a bit of time to go through the jira issues and sort out several
>>> things related to Fault Tolerance.
>>>
>>> Here are my results:
>>>
>>> Fault Tolerance in Hama (all jiras related):
>>>
>>> [HAMA-199] Add fault tolerance to BSPPeer < CLOSE, too generic
>>> [HAMA-445] Make configurable checkpointing
>>> [HAMA-440] Features required in recovery procedure.
>>> [HAMA-498] BSPTask should periodically ping its parent.
>>>
>>> Then I have splitted this in two main parts, "Detect Failure" and "Solve
>>> Failure":
>>>
>>> Detect Failure:
>>> [HAMA-370] Failure detector for Hama < Nearly complete?
>>> [HAMA-498] BSPTask should periodically ping its parent.
>>>
>>> Solve Failure:
>>> [HAMA-445] Make configurable checkpointing
>>>> TODO:
>>>> Groom needs functionality to restart a task
>>>> BSPMaster needs functionality to restart a groom
>>>
>>> Also here is MISC, which is not strongly related.
>>>
>>> MISC:
>>> [HAMA-445] Make configurable checkpointing
>>> [HAMA-440] Features required in recovery procedure.
>>>> TODO mainly discussion:
>>>> New BSP "interface", with a chaining of supersteps to make restarting
>>> tasks more simpler (contained in 440)
>>>
>>>
>>> Let's make an umbrella jira for this larger task and close 199, since this
>>> is way too generic and too old.
>>> We should also split 440, because it combines too much unrelated things
>>> together.
>>>
>>> Also "Lin" has assigned the majority of them. What is your progress? And do
>>> you mind splitting these?
>>>
>>> [LINKS]
>>> https://issues.apache.org/jira/browse/HAMA-440
>>> https://issues.apache.org/jira/browse/HAMA-119
>>> https://issues.apache.org/jira/browse/HAMA-445
>>> https://issues.apache.org/jira/browse/HAMA-440
>>> https://issues.apache.org/jira/browse/HAMA-370
>>> https://issues.apache.org/jira/browse/HAMA-498
>>>
>>> --
>>> Thomas Jungblut
>>> Berlin <[email protected]>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon

Reply via email to