[ 
https://issues.apache.org/jira/browse/HADOOP-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715076#action_12715076
 ] 

Gautam Kowshik commented on HADOOP-5949:
----------------------------------------

Would it make sense to provide a feature to be able to force a kill from within 
the job? Once the user's mapreduce job detects that it has reached a state 
after which it can't resume, it can hint/force the JT to end this job, an 
emergency button of sorts. This would empower the user implementations to get 
out of the bad jobs asap and achieve better cluster utilization. 

> JobTracker should give preference to failed tasks over virgin tasks so as to 
> terminate the job ASAP if it is eventually going to fail. 
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5949
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5949
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Gautam Kowshik
>            Assignee: Devaraj Das
>
> Case in point... I have 1585 maps and 160 slots (40 nodes). The job is such 
> that all maps fail within 2-3 minutes. The job takes forever to realise that 
> the job is bad. It took 2526 failures for it to reach 4 failed attempts for a 
> task. 
> As I understand, currently the JT prefers a failed task if and only if a task 
> tracker with a split replica for that map came asking for a task. In fact 
> there may not be a single TT at all in the mapred cluster which has a replica 
> for the splits used in this job (pre-0.20). This delays the job failure by a 
> lot and hence degrades cluster utilization as a whole. If i'm on a shared 
> cluster with many jobs waiting on it to fail, it's bad. 
> The JT should prefer a failed task a lot earlier than waiting for a data 
> local TT to come around asking. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to