Re: Reduce jobs being killed

Arun C Murthy Thu, 16 Aug 2007 11:02:38 -0700

On Thu, Aug 16, 2007 at 07:36:48AM -0700, Phantom wrote:
>Hi
>
>My Reduce jobs do not write any data to disk but fire off a network call to
>an RPC server with the data. However all reduce jobs are getting killed with
>the following the error message :
>
>Task failed to report status for 606 seconds. Killing.
>Task failed to report status for 602 seconds. Killing.
>Task failed to report status for 600 seconds. Killing.
>
>What might be causing this ? How do I start addressing this ?
>


I'd bet your RPC calls are taking too long; hence the reduce task isn't 
reporting any progress and after the default 10 minute timeout the TaskTracker 
is killing your Reduce task.

Couple of options:
a) Set 'mapred.task.timeout' to a higher value (or 'zero' for infinite)
b) Use 
http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/Reporter.html#setStatus(java.lang.String)
 or 
http://lucene.apache.org/hadoop/api/org/apache/hadoop/util/Progressable.html#progress()
 from your reducer periodically to tell the TaskTracker that you are alive and 
kicking.

I'd do (b).

Arun

>Thanks
>A

Re: Reduce jobs being killed

Reply via email to