On Thu, Aug 16, 2007 at 07:36:48AM -0700, Phantom wrote: >Hi > >My Reduce jobs do not write any data to disk but fire off a network call to >an RPC server with the data. However all reduce jobs are getting killed with >the following the error message : > >Task failed to report status for 606 seconds. Killing. >Task failed to report status for 602 seconds. Killing. >Task failed to report status for 600 seconds. Killing. > >What might be causing this ? How do I start addressing this ? >
I'd bet your RPC calls are taking too long; hence the reduce task isn't reporting any progress and after the default 10 minute timeout the TaskTracker is killing your Reduce task. Couple of options: a) Set 'mapred.task.timeout' to a higher value (or 'zero' for infinite) b) Use http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/Reporter.html#setStatus(java.lang.String) or http://lucene.apache.org/hadoop/api/org/apache/hadoop/util/Progressable.html#progress() from your reducer periodically to tell the TaskTracker that you are alive and kicking. I'd do (b). Arun >Thanks >A
