> but I didn't find a config option > that allows ignoring tasks that fail. > If 0.18, http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo bConf.html#setMaxMapTaskFailuresPercent(int) (mapred.max.map.failures.percent)
http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo bConf.html#setMaxReduceTaskFailuresPercent(int) (mapred.max.reduce.failures.percent) If 0.19 or later, you can also try skipping records. Koji On 8/9/09 2:18 AM, "Mathias De Maré" <[email protected]> wrote: > I changed the maximum split size to 30000, and now most tasks actually > succeed. > However, I still have the failure problem with some tasks (with a job I was > running yesterday, I got a failure after 1900 tasks). > The problem is that these very few failures can bring down the entire job, > as they sometimes seem to just keep failing. > I looked through the mapred-default.xml, but I didn't find a config option > that allows ignoring tasks that fail. Is there a way to do this (it seems > like the only alternative I have, since I can't make the failures stop)? > > Mathias > > 2009/8/5 Mathias De Maré <[email protected]> > >> >> On Wed, Aug 5, 2009 at 9:38 AM, Jothi Padmanabhan >> <[email protected]>wrote: >>> Hi, >>> >>> Could you please try setting this parameter >>> mapred.merge.recordsBeforeProgress to a lower number? >>> See https://issues.apache.org/jira/browse/HADOOP-4714 >>> >>> Cheers >>> Jothi >> >> >> Hm, that bug looks like it's applicable during the merge, but my case is a >> block right before the merge (but seemingly right after all of the map tasks >> finish). >> I tried putting mapred.merge.recordsBeforeProgress to 100, and it didn't >> make a difference. >> >> On Wed, Aug 5, 2009 at 10:32 AM, Amogh Vasekar <[email protected]>wrote: >> >>> 10 mins reminds me of parameter mapred.task.timeout . This is >>> configurable. Or alternatively you might just do a sysout to let tracker >>> know of its existence ( not an ideal solution though ) >>> >>> Thanks, >>> Amogh >> >> >> Well, the map tasks take around 30 minutes to run. Letting the task idle >> for a large number of minutes after that is a lot of useless time, imho. I >> tried with 20 minutes now, but I still get timeouts. >> >> I don't know if it's useful, but here are the settings of the map tasks at >> the moment: >> >> <configuration> >> <property> >> <name>mapred.job.tracker</name> >> <value>localhost:9001</value> >> </property> >> <property> >> <name>io.sort.mb</name> >> <value>3</value> >> <description>The total amount of buffer memory to use while sorting >> files, in megabytes. By default, gives each merge stream 1MB, which >> should minimize seeks.</description> >> </property> >> <property> >> <name>mapred.tasktracker.map.tasks.maximum</name> >> <value>4</value> >> <description>The maximum number of map tasks that will be run >> simultaneously by a task tracker. >> </description> >> </property> >> >> <property> >> <name>mapred.tasktracker.reduce.tasks.maximum</name> >> <value>4</value> >> <description>The maximum number of reduce tasks that will be run >> simultaneously by a task tracker. >> </description> >> </property> >> >> <property> >> <name>mapred.max.split.size</name> >> <value>1000000</value> >> </property> >> >> <property> >> <name>mapred.child.java.opts</name> >> <value>-Xmx400m</value> >> </property> >> >> <property> >> <name>mapred.merge.recordsBeforeProgress</name> >> <value>100</value> >> </property> >> >> <property> >> <name>mapred.task.timeout</name> >> <value>1200000</value> >> </property> >> >> </configuration> >> >> Ideally, I would want to get rid of the delay that causes the timeouts, yet >> also increase the split size somewhat (though I think a larger split size >> would increase the delay even more?). >> The map tasks take around 8000-11000 records as input, and can produce up >> to 1 000 000 records as output (in case this is relevant). >> >> Mathias >> >>
