> On Wed, Aug 5, 2009 at 9:38 AM, Jothi Padmanabhan > <[email protected]>wrote: > Hi, > > Could you please try setting this parameter > mapred.merge.recordsBeforeProgress to a lower number? > See https://issues.apache.org/jira/browse/HADOOP-4714 > > Cheers > Jothi
Hm, that bug looks like it's applicable during the merge, but my case is a block right before the merge (but seemingly right after all of the map tasks finish). I tried putting mapred.merge.recordsBeforeProgress to 100, and it didn't make a difference. On Wed, Aug 5, 2009 at 10:32 AM, Amogh Vasekar <[email protected]> wrote: > 10 mins reminds me of parameter mapred.task.timeout . This is configurable. > Or alternatively you might just do a sysout to let tracker know of its > existence ( not an ideal solution though ) > > Thanks, > Amogh Well, the map tasks take around 30 minutes to run. Letting the task idle for a large number of minutes after that is a lot of useless time, imho. I tried with 20 minutes now, but I still get timeouts. I don't know if it's useful, but here are the settings of the map tasks at the moment: <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>io.sort.mb</name> <value>3</value> <description>The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.</description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>4</value> <description>The maximum number of map tasks that will be run simultaneously by a task tracker. </description> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>4</value> <description>The maximum number of reduce tasks that will be run simultaneously by a task tracker. </description> </property> <property> <name>mapred.max.split.size</name> <value>1000000</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx400m</value> </property> <property> <name>mapred.merge.recordsBeforeProgress</name> <value>100</value> </property> <property> <name>mapred.task.timeout</name> <value>1200000</value> </property> </configuration> Ideally, I would want to get rid of the delay that causes the timeouts, yet also increase the split size somewhat (though I think a larger split size would increase the delay even more?). The map tasks take around 8000-11000 records as input, and can produce up to 1 000 000 records as output (in case this is relevant). Mathias
