RE: Number of tasks

Ashish Thusoo Wed, 28 Jan 2009 07:43:13 -0800

How many nodes do you have in your map/reduce cluster? It could just be the 
case tht the cluster does not have enough map slots so all 344 maps cnnot be 
run simultaneously. Suppose you had a 4 node cluster. Then by your 
configuration you would have a total of 20 map slots. So you would see 20 
mappers started off and then you as each mapper finishes another would move 
from pending to started. This could give an illusion that mappers are running 
one at a time, though at anytime 20 are running concurrently..

Also you could potentially decrease the number of mappers being run by setting 
mapred.min.split.size.

Ashish

________________________________________
From: Josh Ferguson [[email protected]]
Sent: Tuesday, January 27, 2009 9:20 PM
To: [email protected]
Subject: Number of tasks

Ok so I'm experimenting with the slow running hive query I was having
earlier. It was indeed only processing one map task at a time even
though I *think* I told it to do more. Anyone who is good with hadoop
feel free to speak up here as well, this is my first foray into trying
to setup jobs for production. Here is the relevant configuration used
on the job tracker and task tracker machines.

   <property>
     <name>mapred.map.tasks</name>
     <value>7</value>
     <description>The default number of map tasks per job.  Typically
set
     to a prime several times greater than number of available hosts.
     Ignored when mapred.job.tracker is "local".
     </description>
   </property>

   <property>
     <name>mapred.reduce.parallel.copies</name>
     <value>20</value>
     <description>The default number of parallel transfers run by reduce
     during the copy(shuffle) phase.
     </description>
   </property>

   <property>
     <name>mapred.tasktracker.map.tasks.maximum</name>
     <value>5</value>
     <description>The maximum number of map tasks that will be run
     simultaneously by a task tracker.
     </description>
   </property>

   <property>
     <name>mapred.tasktracker.reduce.tasks.maximum</name>
     <value>5</value>
     <description>The maximum number of reduce tasks that will be run
     simultaneously by a task tracker.
     </description>
   </property>

The query was SELECT COUNT(DISTINCT(table.field)) FROM table;

Anyone know why this might only be running one map task at a time?
Takes about 5 minutes to go through 344 of them at this rate.

Josh Ferguson

RE: Number of tasks

Reply via email to