Number of tasks

Josh Ferguson Tue, 27 Jan 2009 21:20:26 -0800

Ok so I'm experimenting with the slow running hive query I was havingearlier. It was indeed only processing one map task at a time eventhough I *think* I told it to do more. Anyone who is good with hadoopfeel free to speak up here as well, this is my first foray into tryingto setup jobs for production. Here is the relevant configuration usedon the job tracker and task tracker machines.


  <property>
    <name>mapred.map.tasks</name>
    <value>7</value>

<description>The default number of map tasks per job. Typicallyset

    to a prime several times greater than number of available hosts.
    Ignored when mapred.job.tracker is "local".
    </description>
  </property>


  <property>
    <name>mapred.reduce.parallel.copies</name>
    <value>20</value>
    <description>The default number of parallel transfers run by reduce
    during the copy(shuffle) phase.
    </description>
  </property>

  <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>5</value>
    <description>The maximum number of map tasks that will be run
    simultaneously by a task tracker.
    </description>
  </property>

  <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>5</value>
    <description>The maximum number of reduce tasks that will be run
    simultaneously by a task tracker.
    </description>
  </property>

The query was SELECT COUNT(DISTINCT(table.field)) FROM table;

Anyone know why this might only be running one map task at a time?Takes about 5 minutes to go through 344 of them at this rate.


Josh Ferguson

Number of tasks

Reply via email to