RE: Query about number of task trackers specific to a site

Mahajan, Neeraj Fri, 17 Aug 2007 12:31:30 -0700

Hi Michael,

Thanks for the prompt reply. I was going thorugh your bug description,
but it (the second statement) didn't completely make sense to me.
>       When I start a job, hadoop uses mapred.tasktracker.tasks.maximum
on the jobtracker. Once these tasks finish, it is the tasktracker's
value of 
>       mapred.tasktracker.tasks.maximum that decides how many new tasks
are created for each host.

Could you please explain it.  

Thanks,
Neeraj

-----Original Message-----
From: Michael Bieniosek [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 17, 2007 11:55 AM
To: [email protected]; Mahajan, Neeraj
Subject: Re: Query about number of task trackers specific to a site

https://issues.apache.org/jira/browse/HADOOP-1245

This bug makes it difficult to run hadoop on heterogeneous clusters
efficiently.  Aside from fixing the bug, your best options are probably:
1) split your large heterogeneous cluster into smaller homogeneous
clusters
2) run with lots of small tasks so the tasktracker's value for
maxCurrentTasks replaces the jobtracker's bad value more quickly.

-Michael 

On 8/17/07 11:47 AM, "Mahajan, Neeraj" <[EMAIL PROTECTED]> wrote:

> Hi,
>  
> In my hadoop setup, say I have 4 machines (M1 - M4). M1 is the master 
> with the Job tracker.
> Say I want 4 parallel tasks on M1, 2 on M2/M3 and 6 on M4. I set the 
> corresponding property (mapred.tasktracker.tasks.maximum) in 
> hadoop-site.xml for each of the machines.
> I observed that when all the task trackers start, maxCurrentTasks is 
> loaded correctly. But when I execute a job, I can see that 4 
> TaskTracker$Child execute on each of the machine. Any idea what am I 
> missing or is this a known bug?
> 
> Regards,
> Neeraj

RE: Query about number of task trackers specific to a site

Reply via email to