[ 
https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644218#action_12644218
 ] 

Runping Qi commented on HADOOP-4305:
------------------------------------


The number of slots per node is a guess-work at best and it just means in 
normal case the TT can run that many tasks of typical jobs concurrently.
However, tasks of different jobs may need different resources. Thus, when that 
many tasks needing large resources run consurrently, some task may fail.
However, the TT may work fine if fewer tasks run concurrently. 
So in genral, you should reduce the max allowed concurrent tasks when some 
tasks fail. 
If the failing continues, that number may eventually reach to zero. That is 
effectively equivalent to be blacklisted.

After certain period of time without failures, that number should be 
incremented gradually.
Ths way, TT will adapt to various load/resource situation automously.
 

> repeatedly blacklisted tasktrackers should get declared dead
> ------------------------------------------------------------
>
>                 Key: HADOOP-4305
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4305
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.20.0
>
>
> When running a batch of jobs it often happens that the same tasktrackers are 
> blacklisted again and again. This can slow job execution considerably, in 
> particular, when tasks fail because of timeout.
> It would make sense to no longer assign any tasks to such tasktrackers and to 
> declare them dead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to