I only have one task tracker right now because I'm just setting up some testing. But that one machine only runs 1 mapper at a time. In the job tracker web interface I only ever see 1 job running at a time and no jobs ever start simultaneously from what I can tell. Is the behavior of a single task tracker that it can spawn *only* 1 child JVM at a time to do maps for a single job? How do I get it to spawn 4-6 children for mapping jobs at once? Josh Ferguson.
On Wed, Jan 28, 2009 at 7:38 AM, Ashish Thusoo <[email protected]> wrote: > How many nodes do you have in your map/reduce cluster? It could just be the > case tht the cluster does not have enough map slots so all 344 maps cnnot be > run simultaneously. Suppose you had a 4 node cluster. Then by your > configuration you would have a total of 20 map slots. So you would see 20 > mappers started off and then you as each mapper finishes another would move > from pending to started. This could give an illusion that mappers are > running one at a time, though at anytime 20 are running concurrently.. > > Also you could potentially decrease the number of mappers being run by > setting mapred.min.split.size. > > Ashish > > ________________________________________ > From: Josh Ferguson [[email protected]] > Sent: Tuesday, January 27, 2009 9:20 PM > To: [email protected] > Subject: Number of tasks > > Ok so I'm experimenting with the slow running hive query I was having > earlier. It was indeed only processing one map task at a time even > though I *think* I told it to do more. Anyone who is good with hadoop > feel free to speak up here as well, this is my first foray into trying > to setup jobs for production. Here is the relevant configuration used > on the job tracker and task tracker machines. > > <property> > <name>mapred.map.tasks</name> > <value>7</value> > <description>The default number of map tasks per job. Typically > set > to a prime several times greater than number of available hosts. > Ignored when mapred.job.tracker is "local". > </description> > </property> > > <property> > <name>mapred.reduce.parallel.copies</name> > <value>20</value> > <description>The default number of parallel transfers run by reduce > during the copy(shuffle) phase. > </description> > </property> > > <property> > <name>mapred.tasktracker.map.tasks.maximum</name> > <value>5</value> > <description>The maximum number of map tasks that will be run > simultaneously by a task tracker. > </description> > </property> > > <property> > <name>mapred.tasktracker.reduce.tasks.maximum</name> > <value>5</value> > <description>The maximum number of reduce tasks that will be run > simultaneously by a task tracker. > </description> > </property> > > The query was SELECT COUNT(DISTINCT(table.field)) FROM table; > > Anyone know why this might only be running one map task at a time? > Takes about 5 minutes to go through 344 of them at this rate. > > Josh Ferguson >
