Amit, The mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum properties can be controlled on a per-host basis in their hadoop-site.xml files. With this you can configure nodes with more/fewer cores/RAM/etc to take on varying amounts of work.
There's no current mechanism to provide feedback to the task scheduler, though, based on actual machine utilization in real time. - Aaron On Tue, Apr 7, 2009 at 7:54 AM, amit handa <[email protected]> wrote: > Hi, > > Is there a way I can control number of tasks that can be spawned on a > machine based on the machine capacity and how loaded the machine already is > ? > > My use case is as following: > > I have to perform task 1,task2,task3 ...task n . > These tasks have varied CPU and memory usage patterns. > All tasks of type task 1,task3 can take 80-90%CPU and 800 MB of RAM. > All type of tasks task2 take only 1-2% of CPU and 5-10 MB of RAM > > How do i model this using Hadoop ? Can i use only one cluster for running > all these type of tasks ? > Shall I use different hadoop clusters for each tasktype , if yes, then how > do i share data between these tasks (the data can be few MB to few GB) > > Please suggest or point to any docs which i can dig up. > > Thanks, > Amit >
