[ 
https://issues.apache.org/jira/browse/HADOOP-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598614#action_12598614
 ] 

Iván de Prado commented on HADOOP-3420:
---------------------------------------

I understand. So the solution is not so easy. The problem I see with the 
current configuration schema arises for clusters that usually execute jobs in 
sequence, but jobs in parallel are executed some times. Let's suppose you have 
nodes with N CPUs and you can execute at most N tasks per node with the 
available memory. You have to configure N/2 max maps and N/2 max reduces per 
node if you want to be able to execute some jobs in parallel. But the cluster 
will take advantage of only half of the resources when executing sequential 
jobs.

Is it possible to have a configuration schema that allows to use all resources 
for sequential jobs but not more than available resources when parallel job 
executions?

Does it make sense to have a  mapred.tasktracker.tasks.maximum that limits the 
maximun total number of tasks per node, but forcing  
mapred.tasktracker.reduce.tasks.maximum to be smaller than  
mapred.tasktracker.tasks.maximum for skip the possible deadlock?

Thanks for your amazing OS project. 

> Recover the deprecated mapred.tasktracker.tasks.maximum
> -------------------------------------------------------
>
>                 Key: HADOOP-3420
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3420
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.16.0, 0.16.1, 0.16.2, 0.16.3, 0.16.4
>            Reporter: Iván de Prado
>
> https://issues.apache.org/jira/browse/HADOOP-1274 replaced the configuration 
> attribute mapred.tasktracker.tasks.maximum with 
> mapred.tasktracker.map.tasks.maximum and 
> mapred.tasktracker.reduce.tasks.maximum because it sometimes make sense to 
> have more mappers than reducers assigned to each node.
> But deprecating mapred.tasktracker.tasks.maximum could be an issue in some 
> situations. For example, when more than one job is running, reduce tasks + 
> map tasks eat too many resources. For avoid this cases an upper limit of 
> tasks is needed. So I propose to have the configuration parameter 
> mapred.tasktracker.tasks.maximum as a total limit of task. It is compatible 
> with mapred.tasktracker.map.tasks.maximum and 
> mapred.tasktracker.reduce.tasks.maximum.
> As an example:
> I have a 8 cores, 4GB, 4 nodes cluster. I want to limit the number of tasks 
> per node to 8. 8 tasks per nodes would use almost 100% cpu and 4 GB of the 
> memory. I have set:
>   mapred.tasktracker.map.tasks.maximum -> 8
>   mapred.tasktracker.reduce.tasks.maximum -> 8 
> 1) When running only one Job at the same time, it works smoothly: 8 task 
> average per node, no swapping in nodes, almost 4 GB of memory usage and 100% 
> of CPU usage. 
> 2) When running more than one Job at the same time, it works really bad: 16 
> tasks average per node, 8 GB usage of memory (4 GB swapped), and a lot of 
> System CPU usage.
> So, I think that have sense to restore the old attribute 
> mapred.tasktracker.tasks.maximum making it compatible with the new ones.
> Task trackers could not:
>  - run more than mapred.tasktracker.tasks.maximum tasks per node,
>  - run more than mapred.tasktracker.map.tasks.maximum mappers per node, 
>  - run more than mapred.tasktracker.reduce.tasks.maximum reducers per node. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to