Re: nodes lying idle

himanshu chandola Sat, 12 Sep 2009 21:30:59 -0700

So, I used JobConf.setNumMapTasks and it worked. I used setNumMapTasks(40) and 
I ended up with 100 maps rather than the 6 I had initially. 
The size of my data is 32 mb but every line is converted into an object and the 
computations are cpu intensive so I would like to have as many map jobs as 
there are cores.
There is no xml entry of the type map.tasks.maximum. I'm using cloudera's 
distribution 0.18.3-14.


 Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.



----- Original Message ----
From: Chandraprakash Bhagtani <[email protected]>
To: [email protected]
Sent: Saturday, September 12, 2009 8:22:13 AM
Subject: Re: nodes lying idle

no *mapred.tasktracker.map/reduce.tasks.maximum* value is for single
datanode. i.e. this many mappers and reducers will run on single datanode.
for example if you have set

*mapred.tasktracker.map.tasks.maximum = 4
**mapred.tasktracker.reduce.tasks.maximum = 4*
and no. of datanodes = 40

then entire cluster's map task capacity = 4*40 = 160
*map.tasks.maximum* = 6 means only 6 maps will run for your job which will
definitely not use all of your cluster resources.

what is the size of your data? and what is your cluster specifications?

-- 
Thanks & Regards,
Chandra Prakash Bhagtani,


On Sat, Sep 12, 2009 at 12:25 PM, himanshu chandola <
[email protected]> wrote:

> Thanks for the tip.
> So is the value mapred.tasktracker.map/reduce.tasks.maximum for the entire
> cluster ? I had set the map.tasks.maximum to 6 and hitting the web interface
> it shows up that total map tasks for my job is just 6. My tasks are cpu
> intensive and hence I would like each of my quad core nodes to be running 4
> hadoop map tasks atleast . The whole cluster is running just 6 and each of
> these 6 nodes is  running 1 each.
>
>  Morpheus: Do you believe in fate, Neo?
> Neo: No.
> Morpheus: Why Not?
> Neo: Because I don't like the idea that I'm not in control of my life.
>
>
>
> ----- Original Message ----
> From: Chandraprakash Bhagtani <[email protected]>
> To: [email protected]
> Sent: Saturday, September 12, 2009 1:49:41 AM
> Subject: Re: nodes lying idle
>
> You need to check your cluster's Map/Reduce task  capacity. i.e. how many
> Map/Reduce task can run on cluster at once. You can check it on
> http://JobtrackerServerIP:50030. You should also check total number of map
> tasks in your job. It should be greater than map task capacity of the
> cluster.
>
> Intially reduce tasks will be idle till first batch of map task complete.
> --
> Thanks & Regards,
> Chandra Prakash Bhagtani,
>
> On Sat, Sep 12, 2009 at 10:31 AM, himanshu chandola <
> [email protected]> wrote:
>
> > Hi everyone,
> > Ive a cluster of 40 nodes. The input file has 2^18 lines and every line
> is
> > an input to a map job. Every node is a quad core and hence I've set
> > mapred.tasktracker.map/reduce.tasks.maximum to a value greater than 4.
> The
> > first 20 nodes are showing hadoop jobs taking 100% but with only one
> process
> > running while since its a quad core I would've liked to see 4 java
> processes
> > taking 100% (there are 5 java processes on this system but 4 are idle and
> > only one is taking 100% or 1 cpu). For the last half of the nodes, the
> cpu
> > usage of hadoop processes is 0. This is really strange since my map tasks
> > are processing in a very slow way and I wouldve liked to use all nodes
> and
> > all the cores.
> >
> > What could possibly be wrong ? It would really help if anyone could
> suggest
> > .
> >
> > thanks
> >
> > H
> >
> >  Morpheus: Do you believe in fate, Neo?
> > Neo: No.
> > Morpheus: Why Not?
> > Neo: Because I don't like the idea that I'm not in control of my life.
> >
> >
> >
> >
> >
>
>
>
>
>

Re: nodes lying idle

Reply via email to