pig-user  

Re: Any way to set the number of MapTasks

Alan Gates
Fri, 11 Apr 2008 15:21:30 -0700

In general, pig leaves the number of maps up to hadoop. There is a bug in recent code that causes the number of maps to be set at 1 for some queries (see http://issues.apache.org/jira/browse/PIG-204). You may be encountering that bug.

Alan.

mickey hsieh wrote:
I tried to figure how Pig set the number of task for Map and Reduce jobs.

The number of Map task is always tied to the number of input file.
Since there is one input file, number of Map tasks is 1, enven I had a
5.4 GB file and more than 1000 blocks.
setting mapred.amp.taks has no effect what so ever.

<property>
<name>mapred.map.tasks</name>
<value>7</value>
<description>The default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is "local".
</description>
</property>

The number of Reduce tasks could be set by Hadoop-site.xml
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
<description>The default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is "local".
</description>
</property>

Please advise,

MIckey Hsieh