[
https://issues.apache.org/jira/browse/HAMA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119952#comment-13119952
]
Thomas Jungblut commented on HAMA-431:
--------------------------------------
Yes that is correct.
But I don't see the improvement, if a task fails, the checkpointer also fails
within the task. If you seperate the checkpointer as a seperate process which
guards several tasks, it can fail the tasks it guards if the process is not
working properly. Armstrong is just referring to the need of redundancy to
absorb failure, but with a single process which is guarding several tasks you
have introduced another point of failure which can have a lot more impact than
a single task which fails.
Each task attempt should write the checkpoints with its taskID, attemptID and
superstep (as name?) into HDFS so it can be restarted from outside.
That's just my opinion on that, but you're the fault-tolerance professional ;)
But I would leave this outside for now and we can open another issue that will
add this. In this issue we can talk about the benefits of another process.
> MapReduce NG integration
> ------------------------
>
> Key: HAMA-431
> URL: https://issues.apache.org/jira/browse/HAMA-431
> Project: Hama
> Issue Type: New Feature
> Reporter: Thomas Jungblut
> Assignee: Thomas Jungblut
> Attachments: job_state.dot, task_phase.dot, task_state.dot
>
>
> We should take a look at how to integrate Hama's BSP Engine to Hadoop's
> nextGen application platform.
> Can be currently found in the 0.23 branch.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira