Re: Some tasks fail to report status between the end of the map and the beginning of the merge

Koji Noguchi Mon, 10 Aug 2009 08:48:07 -0700

> but I didn't find a config option
> that allows ignoring tasks that fail.
>
If 0.18, 
http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo
bConf.html#setMaxMapTaskFailuresPercent(int)
(mapred.max.map.failures.percent)



http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo
bConf.html#setMaxReduceTaskFailuresPercent(int)
(mapred.max.reduce.failures.percent)


If 0.19 or later, you can also try skipping records.


Koji



On 8/9/09 2:18 AM, "Mathias De Maré" <[email protected]> wrote:

> I changed the maximum split size to 30000, and now most tasks actually
> succeed.
> However, I still have the failure problem with some tasks (with a job I was
> running yesterday, I got a failure after 1900 tasks).
> The problem is that these very few failures can bring down the entire job,
> as they sometimes seem to just keep failing.
> I looked through the mapred-default.xml, but I didn't find a config option
> that allows ignoring tasks that fail. Is there a way to do this (it seems
> like the only alternative I have, since I can't make the failures stop)?
> 
> Mathias
> 
> 2009/8/5 Mathias De Maré <[email protected]>
> 
>> 
>> On Wed, Aug 5, 2009 at 9:38 AM, Jothi Padmanabhan
>> <[email protected]>wrote:
>>> Hi,
>>> 
>>> Could you please try setting this parameter
>>> mapred.merge.recordsBeforeProgress to a lower number?
>>> See https://issues.apache.org/jira/browse/HADOOP-4714
>>> 
>>> Cheers
>>> Jothi
>> 
>> 
>> Hm, that bug looks like it's applicable during the merge, but my case is a
>> block right before the merge (but seemingly right after all of the map tasks
>> finish).
>> I tried putting mapred.merge.recordsBeforeProgress to 100, and it didn't
>> make a difference.
>> 
>> On Wed, Aug 5, 2009 at 10:32 AM, Amogh Vasekar <[email protected]>wrote:
>> 
>>> 10 mins reminds me of parameter mapred.task.timeout . This is
>>> configurable. Or alternatively you might just do a sysout to let tracker
>>> know of its existence ( not an ideal solution though )
>>> 
>>> Thanks,
>>> Amogh
>> 
>> 
>> Well, the map tasks take around 30 minutes to run. Letting the task idle
>> for a large number of minutes after that is a lot of useless time, imho. I
>> tried with 20 minutes now, but I still get timeouts.
>> 
>> I don't know if it's useful, but here are the settings of the map tasks at
>> the moment:
>> 
>> <configuration>
>>   <property>
>>     <name>mapred.job.tracker</name>
>>     <value>localhost:9001</value>
>>   </property>
>>   <property>
>>     <name>io.sort.mb</name>
>>     <value>3</value>
>>     <description>The total amount of buffer memory to use while sorting
>>     files, in megabytes.  By default, gives each merge stream 1MB, which
>>     should minimize seeks.</description>
>>   </property>
>> <property>
>>   <name>mapred.tasktracker.map.tasks.maximum</name>
>>   <value>4</value>
>>   <description>The maximum number of map tasks that will be run
>>   simultaneously by a task tracker.
>>   </description>
>> </property>
>> 
>> <property>
>>   <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>   <value>4</value>
>>   <description>The maximum number of reduce tasks that will be run
>>   simultaneously by a task tracker.
>>   </description>
>> </property>
>> 
>> <property>
>>   <name>mapred.max.split.size</name>
>>   <value>1000000</value>
>> </property>
>> 
>> <property>
>>   <name>mapred.child.java.opts</name>
>>   <value>-Xmx400m</value>
>> </property>
>> 
>> <property>
>> <name>mapred.merge.recordsBeforeProgress</name>
>> <value>100</value>
>> </property>
>> 
>> <property>
>> <name>mapred.task.timeout</name>
>> <value>1200000</value>
>> </property>
>> 
>> </configuration>
>> 
>> Ideally, I would want to get rid of the delay that causes the timeouts, yet
>> also increase the split size somewhat (though I think a larger split size
>> would increase the delay even more?).
>> The map tasks take around 8000-11000 records as input, and can produce up
>> to 1 000 000 records as output (in case this is relevant).
>> 
>> Mathias
>> 
>>

Re: Some tasks fail to report status between the end of the map and the beginning of the merge

Reply via email to