So, I used JobConf.setNumMapTasks and it worked. I used setNumMapTasks(40) and I ended up with 100 maps rather than the 6 I had initially. The size of my data is 32 mb but every line is converted into an object and the computations are cpu intensive so I would like to have as many map jobs as there are cores. There is no xml entry of the type map.tasks.maximum. I'm using cloudera's distribution 0.18.3-14.
Morpheus: Do you believe in fate, Neo? Neo: No. Morpheus: Why Not? Neo: Because I don't like the idea that I'm not in control of my life. ----- Original Message ---- From: Chandraprakash Bhagtani <[email protected]> To: [email protected] Sent: Saturday, September 12, 2009 8:22:13 AM Subject: Re: nodes lying idle no *mapred.tasktracker.map/reduce.tasks.maximum* value is for single datanode. i.e. this many mappers and reducers will run on single datanode. for example if you have set *mapred.tasktracker.map.tasks.maximum = 4 **mapred.tasktracker.reduce.tasks.maximum = 4* and no. of datanodes = 40 then entire cluster's map task capacity = 4*40 = 160 *map.tasks.maximum* = 6 means only 6 maps will run for your job which will definitely not use all of your cluster resources. what is the size of your data? and what is your cluster specifications? -- Thanks & Regards, Chandra Prakash Bhagtani, On Sat, Sep 12, 2009 at 12:25 PM, himanshu chandola < [email protected]> wrote: > Thanks for the tip. > So is the value mapred.tasktracker.map/reduce.tasks.maximum for the entire > cluster ? I had set the map.tasks.maximum to 6 and hitting the web interface > it shows up that total map tasks for my job is just 6. My tasks are cpu > intensive and hence I would like each of my quad core nodes to be running 4 > hadoop map tasks atleast . The whole cluster is running just 6 and each of > these 6 nodes is running 1 each. > > Morpheus: Do you believe in fate, Neo? > Neo: No. > Morpheus: Why Not? > Neo: Because I don't like the idea that I'm not in control of my life. > > > > ----- Original Message ---- > From: Chandraprakash Bhagtani <[email protected]> > To: [email protected] > Sent: Saturday, September 12, 2009 1:49:41 AM > Subject: Re: nodes lying idle > > You need to check your cluster's Map/Reduce task capacity. i.e. how many > Map/Reduce task can run on cluster at once. You can check it on > http://JobtrackerServerIP:50030. You should also check total number of map > tasks in your job. It should be greater than map task capacity of the > cluster. > > Intially reduce tasks will be idle till first batch of map task complete. > -- > Thanks & Regards, > Chandra Prakash Bhagtani, > > On Sat, Sep 12, 2009 at 10:31 AM, himanshu chandola < > [email protected]> wrote: > > > Hi everyone, > > Ive a cluster of 40 nodes. The input file has 2^18 lines and every line > is > > an input to a map job. Every node is a quad core and hence I've set > > mapred.tasktracker.map/reduce.tasks.maximum to a value greater than 4. > The > > first 20 nodes are showing hadoop jobs taking 100% but with only one > process > > running while since its a quad core I would've liked to see 4 java > processes > > taking 100% (there are 5 java processes on this system but 4 are idle and > > only one is taking 100% or 1 cpu). For the last half of the nodes, the > cpu > > usage of hadoop processes is 0. This is really strange since my map tasks > > are processing in a very slow way and I wouldve liked to use all nodes > and > > all the cores. > > > > What could possibly be wrong ? It would really help if anyone could > suggest > > . > > > > thanks > > > > H > > > > Morpheus: Do you believe in fate, Neo? > > Neo: No. > > Morpheus: Why Not? > > Neo: Because I don't like the idea that I'm not in control of my life. > > > > > > > > > > > > > > >
