Re: MapReduce Jobs being 'stuck' for several hours and then completing

Abhinay Mehta Thu, 28 Apr 2011 09:10:24 -0700

Thanks Koji I'll have a go.

On 28 April 2011 16:44, Koji Noguchi <[email protected]> wrote:


> Hi Abhinay,
>
> If you have access to the compute nodes, then
>
> 1) jstack of streaming mapper jvm
> 2) strace -f of streaming mapper jvm
> 3) strace -f of streaming map process itself
>
> might help.
>
> Koji
>
>
> On 4/28/11 3:33 AM, "Abhinay Mehta" <[email protected]> wrote:
>
> > Hi all,
> >
> > We are using CDH3B4 on the Hadoop Cluster.
> >
> > We have hourly jobs kicking off every hour using the streaming API,
> > each one of these jobs used to take 4/5 mins to complete but since 1pm
> > yesterday all of a sudden started taking 3/4 hours.
> >
> > We looked at the data the jobs are working on and the data is exactly the
> > same as it always has been.
> > The cluster / config has not been touched since the upgrade to CDH3B4
> which
> > was one month ago.
> >
> > No errors are being reported in any of the logs, the jobs are just taking
> > longer, much longer.
> > One thing I have noticed in the logs, when the jobs just sit there in the
> > middle of a job I do see one consistent entry in the slave log files:
> >
> > 2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
> > R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
> > 2011-04-28 11:16:07,849 INFO org.apache.hadoop.streaming.PipeMapRed:
> > R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
> >
> > I see that entry in Map phases and Reduce phases, when the jobs just sit
> > idle for many tens of mins not doing anything.
> > This happens even if there is nothing else running on the cluster.
> >
> > If anyone can shed some light on this or give me a direction to look into
> > further then it would be much appreciated.
> >
> > Thank you.
> >
> > Regards,
> > Abhinay Mehta
>
>

Re: MapReduce Jobs being 'stuck' for several hours and then completing

Reply via email to