Hi Michael, Thanks for the prompt reply. I was going thorugh your bug description, but it (the second statement) didn't completely make sense to me. > When I start a job, hadoop uses mapred.tasktracker.tasks.maximum on the jobtracker. Once these tasks finish, it is the tasktracker's value of > mapred.tasktracker.tasks.maximum that decides how many new tasks are created for each host.
Could you please explain it. Thanks, Neeraj -----Original Message----- From: Michael Bieniosek [mailto:[EMAIL PROTECTED] Sent: Friday, August 17, 2007 11:55 AM To: [email protected]; Mahajan, Neeraj Subject: Re: Query about number of task trackers specific to a site https://issues.apache.org/jira/browse/HADOOP-1245 This bug makes it difficult to run hadoop on heterogeneous clusters efficiently. Aside from fixing the bug, your best options are probably: 1) split your large heterogeneous cluster into smaller homogeneous clusters 2) run with lots of small tasks so the tasktracker's value for maxCurrentTasks replaces the jobtracker's bad value more quickly. -Michael On 8/17/07 11:47 AM, "Mahajan, Neeraj" <[EMAIL PROTECTED]> wrote: > Hi, > > In my hadoop setup, say I have 4 machines (M1 - M4). M1 is the master > with the Job tracker. > Say I want 4 parallel tasks on M1, 2 on M2/M3 and 6 on M4. I set the > corresponding property (mapred.tasktracker.tasks.maximum) in > hadoop-site.xml for each of the machines. > I observed that when all the task trackers start, maxCurrentTasks is > loaded correctly. But when I execute a job, I can see that 4 > TaskTracker$Child execute on each of the machine. Any idea what am I > missing or is this a known bug? > > Regards, > Neeraj
