I have found that this parameter tends to be the limiting factor:
<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>3</value>
<description>The maximum number of tasks that will be run
simultaneously by a task tracker.
</description>
</property>
There are several competing constraints at work which makes it kind of hard
to determine just how many map tasks will be run.
On 9/17/07 5:01 AM, "Toby DiPasquale" <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> No matter what I try, the number of mapper tasks on a given machine is
> always 2. JobConf.setNumMapTasks(X) has no effect, nor does setting
> mapred.map.tasks in the mapred-default.xml configuration. Why are
> these settings ignored? How can I truly increase the number of map
> tasks on a given machine?
>
> I ran a job last night (using 0.14.1) that took 31.5 minutes to map
> 7.5 GB (on HDFS, not s3fs) and then 78 seconds to reduce the results
> of that map (starting from 15% complete when the map phase hit 100%).
> The map took so long because only 6 - 8 out of the 171 mappers were
> running at any one time. I'd really like to know how to move the
> needle on this one so if anyone has any insight, I'd really appreciate
> it. Thanks.