I have a large Hadoop streaming job that generally works fine,
but a few (2-4) of the ~3000 maps and reduces have problems.
To make matters worse, the problems are system-dependent (we run an a
cluster with machines of slightly different OS versions).
I'd of course like to debug these problems, but they are embedded in a
large job.

Is there a way to extract the input given to a reducer from a job, given
the task identity?  (This would also be helpful for mappers.)

This is clearly technically *possible*, since hadoop can rerun the jobs
if they fail.  But is an external program that actually does it?
Or are there instructions for poking around on the compute nodes' local
disks to assemble it by hand?  Or better suggestions?

It would be a real boon for people developing map and reduce user code.

Thanks for any pointers.
   -John Heidemann

Reply via email to