[ 
https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611539#action_12611539
 ] 

Devaraj Das commented on HADOOP-3245:
-------------------------------------

Some initial comments:
1) Remove the unnecessary comments from JobTracker.java
2) Rename the "restarted" field as "recovering"
3) hasJobTrackerRestarted/Recovered API
4) Remove the comment: "//TODO wait for all the incomplete(previously running) 
jobs to be ready" from offerService
5) Put back the call to completedJobStatusStore.store in finalizeJob
6) The method cleanupJob seems unnecessary. What is already done w.r.t cleanup 
will continue to work.
7) The implementation of wasRecovered and hasRecovered should not make a back 
call to the JobTracker
8) Synchronization for tasksInited in initTasks is redundant. Do a notify 
instead of notifyAll in the following line.
9) In the interval between the JT death and restart the reducers might fail to 
fetch map outputs from some tasktrackers (due to faulty map nodes, etc.), but 
it has no one to send the notifications to. The reducers might end up killing 
themselves after a couple of retries.
10) The construction of TaskTrackerStatus should be reverted to how it was done 
earlier (cloneAndResetRunningTaskStatuses called inline with the constructor 
invocation)
11) In TaskTracker.transmitHeartBeat you should call 
cloneAndResetRunningJobTaskStatuses rather than cloneAndResetRunningTaskStatuses
12) Pls move the SYNC action handling to the offerService method
13) shouldResetEventsIndex could be cleared upon the first access as opposed to 
doing it in the heartbeat processing
14) Instead of the additional RPC in Umbilical, you can add an arg in the 
getMapCompletionEvents to know whether to reset or not
15) Factor out common code from 
cloneAndResetRunningJobTaskStatuses/cloneAndResetRunningTaskStatuses


> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>
>                 Key: HADOOP-3245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3245
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3245-v2.5.patch, HADOOP-3245-v2.6.5.patch, 
> HADOOP-3245-v2.6.9.patch
>
>
> This could probably extend the work done in HADOOP-1876. This feature can be 
> applied for things like jobs being able to survive jobtracker restarts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to