How many nodes do you have in your map/reduce cluster? It could just be the case tht the cluster does not have enough map slots so all 344 maps cnnot be run simultaneously. Suppose you had a 4 node cluster. Then by your configuration you would have a total of 20 map slots. So you would see 20 mappers started off and then you as each mapper finishes another would move from pending to started. This could give an illusion that mappers are running one at a time, though at anytime 20 are running concurrently..
Also you could potentially decrease the number of mappers being run by setting mapred.min.split.size. Ashish ________________________________________ From: Josh Ferguson [[email protected]] Sent: Tuesday, January 27, 2009 9:20 PM To: [email protected] Subject: Number of tasks Ok so I'm experimenting with the slow running hive query I was having earlier. It was indeed only processing one map task at a time even though I *think* I told it to do more. Anyone who is good with hadoop feel free to speak up here as well, this is my first foray into trying to setup jobs for production. Here is the relevant configuration used on the job tracker and task tracker machines. <property> <name>mapred.map.tasks</name> <value>7</value> <description>The default number of map tasks per job. Typically set to a prime several times greater than number of available hosts. Ignored when mapred.job.tracker is "local". </description> </property> <property> <name>mapred.reduce.parallel.copies</name> <value>20</value> <description>The default number of parallel transfers run by reduce during the copy(shuffle) phase. </description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>5</value> <description>The maximum number of map tasks that will be run simultaneously by a task tracker. </description> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>5</value> <description>The maximum number of reduce tasks that will be run simultaneously by a task tracker. </description> </property> The query was SELECT COUNT(DISTINCT(table.field)) FROM table; Anyone know why this might only be running one map task at a time? Takes about 5 minutes to go through 344 of them at this rate. Josh Ferguson
