Re: reduce job did not complete in a long time

Karl Anderson Thu, 07 Aug 2008 11:22:13 -0700


On 28-Jul-08, at 6:33 PM, charles du wrote:

Hi:

I tried to run one of my map/reduce jobs on a cluster (hadoop 0.17.0).
I used 10 reducers. 9 of them returns quickly ( in a few seconds), but
one has been running for several hours, and still no sign of
completion. Do you know how I can debug it or find out what is going
on with this reducer?

You can log, and set the status message. If you're using streaming, Ithink you're limited to writing to stderr. The only way I've found toread the logs on a distributed run is by sshing to the actual task boxand looking at the log directory. I've almost gotten frustratedenough to have my tasks send email, but not quite.

Debugging is easier on a single pseudodistributed box because all thelogs and stderr is right there, so try that if you can.

Re: reduce job did not complete in a long time

Reply via email to