Em, yes, you're right.

It'll be meaningful when some task failed to launch, and also when
failed to load checkpoint data from file system.

I was thought, it'll needless to checkpoint recovery. My mis-understand :)

On Thu, Feb 2, 2012 at 4:51 PM, Thomas Jungblut
<[email protected]> wrote:
> Hi Edward,
>
> I would like to get into this fault-tolerance thing ASAP, we have to
> include this in our next release. This is the argument to not include hama
> in production environments.
> In my opinion, yes we need these Attempts. Due to various reasons:
> - input split is bound to a specific index, related to the sorting of the
> task ids
> - theres a mapping in zookeeper for host:port->taskid
>
> I want to tell you about the examples which use the master-client
> architecture, which relies on the fact that the task's are sorted ascending.
> If the mastertask fails, a reattempt won't break the ordering. Only the
> host:port mapping must be updated in the zk and the other tasks have to
> flush the caches and remap the znodes.
> If you add a new task, you'll get a lot more pain than you actually want ;)
>
> Attemps are fine, or is there a specific problem you want to avoid?
>
> 2012/2/2 Edward J. Yoon <[email protected]>
>
>> Few Task-related classes e.g., TaskAttemptID .., etc. are copied from
>> Hadoop MapReduce.
>>
>> Do you think we need to implement Task re-attempt mechanism?
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <[email protected]>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Reply via email to