RE: Query about number of task trackers specific to a site

Mahajan, Neeraj Fri, 17 Aug 2007 13:16:07 -0700

Hmm ..
I am not observing the second behavior. I ran a job of more than 500
tasks. Each task tracker executed many tasks, but at all times I could
see that 4 child processes were running on each machine.


~ Neeraj 

-----Original Message-----
From: Michael Bieniosek [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 17, 2007 1:01 PM
To: Mahajan, Neeraj; [email protected]
Subject: Re: Query about number of task trackers specific to a site

I updated the description on the jira ticket.

You can imagine that the cluster could potentially operate in two modes:
1) configure the value for number of parallel tasks once on the
jobtracker, so each tasktracker gets the same number of parallel tasks.
This assumes that all the machines in the cluster have comparable
hardware.
2) configure the value for number of parallel tasks for each
tasktracker, so each tasktracker could potentially get a different
number of parallel tasks.
This is what you want for your situation.

When a new hadoop job starts up, the cluster operates in mode 1).  After
one task finishes on the tasktracker, that tasktracker seems to move
into mode 2).  

-Michael

On 8/17/07 12:31 PM, "Mahajan, Neeraj" <[EMAIL PROTECTED]> wrote:

> Hi Michael,
> 
> Thanks for the prompt reply. I was going thorugh your bug description,

> but it (the second statement) didn't completely make sense to me.
>> When I start a job, hadoop uses mapred.tasktracker.tasks.maximum
> on the jobtracker. Once these tasks finish, it is the tasktracker's 
> value of
>> mapred.tasktracker.tasks.maximum that decides how many new tasks
> are created for each host.
> 
> Could you please explain it.
> 
> Thanks,
> Neeraj
> 
> -----Original Message-----
> From: Michael Bieniosek [mailto:[EMAIL PROTECTED]
> Sent: Friday, August 17, 2007 11:55 AM
> To: [email protected]; Mahajan, Neeraj
> Subject: Re: Query about number of task trackers specific to a site
> 
> https://issues.apache.org/jira/browse/HADOOP-1245
> 
> This bug makes it difficult to run hadoop on heterogeneous clusters 
> efficiently.  Aside from fixing the bug, your best options are
probably:
> 1) split your large heterogeneous cluster into smaller homogeneous 
> clusters
> 2) run with lots of small tasks so the tasktracker's value for 
> maxCurrentTasks replaces the jobtracker's bad value more quickly.
> 
> -Michael
> 
> On 8/17/07 11:47 AM, "Mahajan, Neeraj" <[EMAIL PROTECTED]> wrote:
> 
>> Hi,
>>  
>> In my hadoop setup, say I have 4 machines (M1 - M4). M1 is the master

>> with the Job tracker.
>> Say I want 4 parallel tasks on M1, 2 on M2/M3 and 6 on M4. I set the 
>> corresponding property (mapred.tasktracker.tasks.maximum) in 
>> hadoop-site.xml for each of the machines.
>> I observed that when all the task trackers start, maxCurrentTasks is 
>> loaded correctly. But when I execute a job, I can see that 4 
>> TaskTracker$Child execute on each of the machine. Any idea what am I 
>> missing or is this a known bug?
>> 
>> Regards,
>> Neeraj

RE: Query about number of task trackers specific to a site

Reply via email to