[
https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611539#action_12611539
]
Devaraj Das commented on HADOOP-3245:
-------------------------------------
Some initial comments:
1) Remove the unnecessary comments from JobTracker.java
2) Rename the "restarted" field as "recovering"
3) hasJobTrackerRestarted/Recovered API
4) Remove the comment: "//TODO wait for all the incomplete(previously running)
jobs to be ready" from offerService
5) Put back the call to completedJobStatusStore.store in finalizeJob
6) The method cleanupJob seems unnecessary. What is already done w.r.t cleanup
will continue to work.
7) The implementation of wasRecovered and hasRecovered should not make a back
call to the JobTracker
8) Synchronization for tasksInited in initTasks is redundant. Do a notify
instead of notifyAll in the following line.
9) In the interval between the JT death and restart the reducers might fail to
fetch map outputs from some tasktrackers (due to faulty map nodes, etc.), but
it has no one to send the notifications to. The reducers might end up killing
themselves after a couple of retries.
10) The construction of TaskTrackerStatus should be reverted to how it was done
earlier (cloneAndResetRunningTaskStatuses called inline with the constructor
invocation)
11) In TaskTracker.transmitHeartBeat you should call
cloneAndResetRunningJobTaskStatuses rather than cloneAndResetRunningTaskStatuses
12) Pls move the SYNC action handling to the offerService method
13) shouldResetEventsIndex could be cleared upon the first access as opposed to
doing it in the heartbeat processing
14) Instead of the additional RPC in Umbilical, you can add an arg in the
getMapCompletionEvents to know whether to reset or not
15) Factor out common code from
cloneAndResetRunningJobTaskStatuses/cloneAndResetRunningTaskStatuses
> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>
> Key: HADOOP-3245
> URL: https://issues.apache.org/jira/browse/HADOOP-3245
> Project: Hadoop Core
> Issue Type: New Feature
> Components: mapred
> Reporter: Devaraj Das
> Assignee: Amar Kamat
> Attachments: HADOOP-3245-v2.5.patch, HADOOP-3245-v2.6.5.patch,
> HADOOP-3245-v2.6.9.patch
>
>
> This could probably extend the work done in HADOOP-1876. This feature can be
> applied for things like jobs being able to survive jobtracker restarts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.