[ 
https://issues.apache.org/jira/browse/HADOOP-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677657#action_12677657
 ] 

Amar Kamat commented on HADOOP-5337:
------------------------------------

Had a discussion with Devaraj. There are two ways in which this can be avoided.
# Have a safe-mode like thingy where the jobtracker doesnt schedule any task 
but allows trackers to join back. Upon recovery, the jobtracker knows about all 
the tasktracker that the pervious jobtracker knew about. The new jobtracker 
waits for all the trackers to join back before opening up for scheduling. So 
the scheduling will start when
  _num-reconnect-trackers + num-trackers-lost == 
num-trackers-known-upon-restart_
# schedule tasks greedily but then kill the newly scheduled tasks if an older 
attempt join back. There will be a bit of trashing.

We personally feel that option 1 is better as it avoids unnecessary scheduling 
and killings. 

> JobTracker greedily schedules tasks without running tasks to join
> -----------------------------------------------------------------
>
>                 Key: HADOOP-5337
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5337
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Karam Singh
>
> This issue was observed when JobTracker was restarted 3 times and observed 
> that 4 instances of each reduce task were running. This issue is observed 
> when cluster is not fully occupied.
> In testcase: Map/reduces capacity is 200/200 slots respectively and Job 
> profile is 11000 maps, 10 reduces and speculative execution is off.  
> JobTracker was restarted 3 times in small intervals of about 5 mins and after 
> recovery, 40 reduce tasks were running. Task details web page 
> (taskdetails.jsp) was  showing 4 running attempts of each reduce task.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to