MapReduce Jobs being 'stuck' for several hours and then completing

Abhinay Mehta Thu, 28 Apr 2011 06:37:41 -0700

Hi all,

We are using CDH3B4 on the Hadoop Cluster.


We have hourly jobs kicking off every hour using the streaming API,
each one of these jobs used to take 4/5 mins to complete but since 1pm
yesterday all of a sudden started taking 3/4 hours.

We looked at the data the jobs are working on and the data is exactly the
same as it always has been.
The cluster / config has not been touched since the upgrade to CDH3B4 which
was one month ago.

No errors are being reported in any of the logs, the jobs are just taking
longer, much longer.
One thing I have noticed in the logs, when the jobs just sit there in the
middle of a job I do see one consistent entry in the slave log files:

2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]

I see that entry in Map phases and Reduce phases, when the jobs just sit
idle for many tens of mins not doing anything.
This happens even if there is nothing else running on the cluster.

If anyone can shed some light on this or give me a direction to look into
further then it would be much appreciated.

Thank you.

Regards,
Abhinay Mehta

MapReduce Jobs being 'stuck' for several hours and then completing

Reply via email to