Hi all,

Let me preface this with my understanding of how tasks work.

If a task takes a long time (default 10min) and demonstrates no progress, the 
task tracker will decide the process is hung, kill it, and start a new attempt. 
 Normally, one uses a Reporter instance's progress method to provide progress 
updates and avoid this. For a streaming mapper, the Reporter class is 
org.apache.hadoop.mapred.Task$TaskReporter and this works well.  Streaming is 
even set up to take progress, status, and counter updates from stderr, which is 
really cool.

However, for combiner tasks, the class is org.apache.hadoop.mapred.Reporter$1.  
The first subclass in this particular java file is the Reporter.NULL class, 
which ignores all updates.  So even if a combiner task is updating its reporter 
in accordance with docs (see postscript), its updates are ignored and it dies 
at 10 minutes.  Or one sets mapred.task.timeout very high, allowing truly hung 
tasks to go unrecognised for much longer.

At least this is what I've been able to put together from reading code and 
searching the web for docs (except hadoop jira which has been down for a while 
- my bad luck).

So am I understanding this correctly?  Are there plans to change this?  Or 
reasons that combiners can't have normal reporters associated to them?

Thanks for any help,
Chris

http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Reporter
http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/ (cf tip 7)
http://hadoop.apache.org/common/docs/r0.18.3/streaming.html#How+do+I+update+counters+in+streaming+applications%3F
http://hadoop.apache.org/common/docs/r0.20.0/mapred-default.html  (cf 
mapred.task.timeout)

Reply via email to