[
https://issues.apache.org/jira/browse/HADOOP-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677657#action_12677657
]
Amar Kamat commented on HADOOP-5337:
------------------------------------
Had a discussion with Devaraj. There are two ways in which this can be avoided.
# Have a safe-mode like thingy where the jobtracker doesnt schedule any task
but allows trackers to join back. Upon recovery, the jobtracker knows about all
the tasktracker that the pervious jobtracker knew about. The new jobtracker
waits for all the trackers to join back before opening up for scheduling. So
the scheduling will start when
_num-reconnect-trackers + num-trackers-lost ==
num-trackers-known-upon-restart_
# schedule tasks greedily but then kill the newly scheduled tasks if an older
attempt join back. There will be a bit of trashing.
We personally feel that option 1 is better as it avoids unnecessary scheduling
and killings.
> JobTracker greedily schedules tasks without running tasks to join
> -----------------------------------------------------------------
>
> Key: HADOOP-5337
> URL: https://issues.apache.org/jira/browse/HADOOP-5337
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: Karam Singh
>
> This issue was observed when JobTracker was restarted 3 times and observed
> that 4 instances of each reduce task were running. This issue is observed
> when cluster is not fully occupied.
> In testcase: Map/reduces capacity is 200/200 slots respectively and Job
> profile is 11000 maps, 10 reduces and speculative execution is off.
> JobTracker was restarted 3 times in small intervals of about 5 mins and after
> recovery, 40 reduce tasks were running. Task details web page
> (taskdetails.jsp) was showing 4 running attempts of each reduce task.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.