> On Wed, Aug 5, 2009 at 9:38 AM, Jothi Padmanabhan 
> <[email protected]>wrote:
> Hi,
>
> Could you please try setting this parameter
> mapred.merge.recordsBeforeProgress to a lower number?
> See https://issues.apache.org/jira/browse/HADOOP-4714
>
> Cheers
> Jothi


Hm, that bug looks like it's applicable during the merge, but my case is a
block right before the merge (but seemingly right after all of the map tasks
finish).
I tried putting mapred.merge.recordsBeforeProgress to 100, and it didn't
make a difference.

On Wed, Aug 5, 2009 at 10:32 AM, Amogh Vasekar <[email protected]> wrote:

> 10 mins reminds me of parameter mapred.task.timeout . This is configurable.
> Or alternatively you might just do a sysout to let tracker know of its
> existence ( not an ideal solution though )
>
> Thanks,
> Amogh


Well, the map tasks take around 30 minutes to run. Letting the task idle for
a large number of minutes after that is a lot of useless time, imho. I tried
with 20 minutes now, but I still get timeouts.

I don't know if it's useful, but here are the settings of the map tasks at
the moment:

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
  <property>
    <name>io.sort.mb</name>
    <value>3</value>
    <description>The total amount of buffer memory to use while sorting
    files, in megabytes.  By default, gives each merge stream 1MB, which
    should minimize seeks.</description>
  </property>
<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>4</value>
  <description>The maximum number of map tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>4</value>
  <description>The maximum number of reduce tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

<property>
  <name>mapred.max.split.size</name>
  <value>1000000</value>
</property>

<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx400m</value>
</property>

<property>
<name>mapred.merge.recordsBeforeProgress</name>
<value>100</value>
</property>

<property>
<name>mapred.task.timeout</name>
<value>1200000</value>
</property>

</configuration>

Ideally, I would want to get rid of the delay that causes the timeouts, yet
also increase the split size somewhat (though I think a larger split size
would increase the delay even more?).
The map tasks take around 8000-11000 records as input, and can produce up to
1 000 000 records as output (in case this is relevant).

Mathias

Reply via email to