I only have one task tracker right now because I'm just setting up some
testing. But that one machine only runs 1 mapper at a time. In the job
tracker web interface I only ever see 1 job running at a time and no jobs
ever start simultaneously from what I can tell. Is the behavior of a single
task tracker that it can spawn *only* 1 child JVM at a time to do maps for a
single job? How do I get it to spawn 4-6 children for mapping jobs at once?
Josh Ferguson.

On Wed, Jan 28, 2009 at 7:38 AM, Ashish Thusoo <[email protected]> wrote:

> How many nodes do you have in your map/reduce cluster? It could just be the
> case tht the cluster does not have enough map slots so all 344 maps cnnot be
> run simultaneously. Suppose you had a 4 node cluster. Then by your
> configuration you would have a total of 20 map slots. So you would see 20
> mappers started off and then you as each mapper finishes another would move
> from pending to started. This could give an illusion that mappers are
> running one at a time, though at anytime 20 are running concurrently..
>
> Also you could potentially decrease the number of mappers being run by
> setting mapred.min.split.size.
>
> Ashish
>
> ________________________________________
> From: Josh Ferguson [[email protected]]
> Sent: Tuesday, January 27, 2009 9:20 PM
> To: [email protected]
> Subject: Number of tasks
>
> Ok so I'm experimenting with the slow running hive query I was having
> earlier. It was indeed only processing one map task at a time even
> though I *think* I told it to do more. Anyone who is good with hadoop
> feel free to speak up here as well, this is my first foray into trying
> to setup jobs for production. Here is the relevant configuration used
> on the job tracker and task tracker machines.
>
>   <property>
>     <name>mapred.map.tasks</name>
>     <value>7</value>
>     <description>The default number of map tasks per job.  Typically
> set
>     to a prime several times greater than number of available hosts.
>     Ignored when mapred.job.tracker is "local".
>     </description>
>   </property>
>
>   <property>
>     <name>mapred.reduce.parallel.copies</name>
>     <value>20</value>
>     <description>The default number of parallel transfers run by reduce
>     during the copy(shuffle) phase.
>     </description>
>   </property>
>
>   <property>
>     <name>mapred.tasktracker.map.tasks.maximum</name>
>     <value>5</value>
>     <description>The maximum number of map tasks that will be run
>     simultaneously by a task tracker.
>     </description>
>   </property>
>
>   <property>
>     <name>mapred.tasktracker.reduce.tasks.maximum</name>
>     <value>5</value>
>     <description>The maximum number of reduce tasks that will be run
>     simultaneously by a task tracker.
>     </description>
>   </property>
>
> The query was SELECT COUNT(DISTINCT(table.field)) FROM table;
>
> Anyone know why this might only be running one map task at a time?
> Takes about 5 minutes to go through 344 of them at this rate.
>
> Josh Ferguson
>

Reply via email to