Well, it's been a while since I filed that bug, so it's possible that things have changed, or that I don't remember the circumstances correctly.
Sorry! -Michael On 8/17/07 1:15 PM, "Mahajan, Neeraj" <[EMAIL PROTECTED]> wrote: > Hmm .. > I am not observing the second behavior. I ran a job of more than 500 > tasks. Each task tracker executed many tasks, but at all times I could > see that 4 child processes were running on each machine. > > ~ Neeraj > > -----Original Message----- > From: Michael Bieniosek [mailto:[EMAIL PROTECTED] > Sent: Friday, August 17, 2007 1:01 PM > To: Mahajan, Neeraj; [email protected] > Subject: Re: Query about number of task trackers specific to a site > > I updated the description on the jira ticket. > > You can imagine that the cluster could potentially operate in two modes: > 1) configure the value for number of parallel tasks once on the > jobtracker, so each tasktracker gets the same number of parallel tasks. > This assumes that all the machines in the cluster have comparable > hardware. > 2) configure the value for number of parallel tasks for each > tasktracker, so each tasktracker could potentially get a different > number of parallel tasks. > This is what you want for your situation. > > When a new hadoop job starts up, the cluster operates in mode 1). After > one task finishes on the tasktracker, that tasktracker seems to move > into mode 2). > > -Michael > > On 8/17/07 12:31 PM, "Mahajan, Neeraj" <[EMAIL PROTECTED]> wrote: > >> Hi Michael, >> >> Thanks for the prompt reply. I was going thorugh your bug description, > >> but it (the second statement) didn't completely make sense to me. >>> When I start a job, hadoop uses mapred.tasktracker.tasks.maximum >> on the jobtracker. Once these tasks finish, it is the tasktracker's >> value of >>> mapred.tasktracker.tasks.maximum that decides how many new tasks >> are created for each host. >> >> Could you please explain it. >> >> Thanks, >> Neeraj >> >> -----Original Message----- >> From: Michael Bieniosek [mailto:[EMAIL PROTECTED] >> Sent: Friday, August 17, 2007 11:55 AM >> To: [email protected]; Mahajan, Neeraj >> Subject: Re: Query about number of task trackers specific to a site >> >> https://issues.apache.org/jira/browse/HADOOP-1245 >> >> This bug makes it difficult to run hadoop on heterogeneous clusters >> efficiently. Aside from fixing the bug, your best options are > probably: >> 1) split your large heterogeneous cluster into smaller homogeneous >> clusters >> 2) run with lots of small tasks so the tasktracker's value for >> maxCurrentTasks replaces the jobtracker's bad value more quickly. >> >> -Michael >> >> On 8/17/07 11:47 AM, "Mahajan, Neeraj" <[EMAIL PROTECTED]> wrote: >> >>> Hi, >>> >>> In my hadoop setup, say I have 4 machines (M1 - M4). M1 is the master > >>> with the Job tracker. >>> Say I want 4 parallel tasks on M1, 2 on M2/M3 and 6 on M4. I set the >>> corresponding property (mapred.tasktracker.tasks.maximum) in >>> hadoop-site.xml for each of the machines. >>> I observed that when all the task trackers start, maxCurrentTasks is >>> loaded correctly. But when I execute a job, I can see that 4 >>> TaskTracker$Child execute on each of the machine. Any idea what am I >>> missing or is this a known bug? >>> >>> Regards, >>> Neeraj
