[ 
https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615151#action_12615151
 ] 

Owen O'Malley commented on HADOOP-3245:
---------------------------------------

If we are counting on the TaskTracker's reports to rebuild the state, we should 
have a safe-mode equivalent where we wait for 2-3 minutes before launching new 
tasks, otherwise we will trash the cluster as each new TaskTracker reports 
back. Please also make sure that the TaskTracker does not reset and lose state 
if it gets an IOException when talking to the JobTracker.

However, rather than have TaskTracker's store additional information about the 
final task status of each completed task in ram, I think we should reconsider 
the option of using the JobHistory as a transaction log for each job. For 
storage on local disk, we probably should support writing a second copy to NFS 
so that a different node could bring up the JobTracker.

In any case, the extra completed task state should be on disk rather than ram. 
We also need to make sure that the JobHistory is complete and consistent even 
after the restoration.



> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>
>                 Key: HADOOP-3245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3245
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3245-v2.5.patch, HADOOP-3245-v2.6.5.patch, 
> HADOOP-3245-v2.6.9.patch, HADOOP-3245-v4.1.patch
>
>
> This could probably extend the work done in HADOOP-1876. This feature can be 
> applied for things like jobs being able to survive jobtracker restarts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to