Re: Shuffle tasks getting killed

cliff palmer Fri, 24 Sep 2010 08:28:38 -0700

I'm glad it helped Aniket.  I would recommend that you start working on
performance improvement with your network infrastructure and the balance of
data across your logical racks.Cliff


On Fri, Sep 24, 2010 at 12:12 AM, aniket ray <[email protected]> wrote:

> Hi Cliff,
>
> Thanks it did turn out to be speculative execution. When I turned it off,
> no
> more tasks were killed and the performance degraded.
>
> So my initial assumptions were incorrect after all. I guess I'll have to
> look at other ways to improve performance.
>
> Thanks for the help.
> -aniket
>
> On Thu, Sep 23, 2010 at 5:14 PM, cliff palmer <[email protected]>
> wrote:
>
> > Aniket, I wonder if these tasks were run as Speculative Execution.  Have
> > you
> > been able to determine whether the job runs successfully?
> > HTH
> > Cliff
> >
> > On Thu, Sep 23, 2010 at 12:52 AM, aniket ray <[email protected]>
> wrote:
> >
> > > Hi,
> > >
> > > I continuously run a series of batch job using Hadoop Map Reduce. I
> also
> > > have a managing daemon that moves data around on the hdfs making way
> for
> > > more jobs to be run.
> > > I use capacity scheduler to schedule many jobs in parallel.
> > >
> > > I see an issue on the Hadoop web monitoring UI at port 50030 which I
> > > believe
> > > may be causing a performance bottleneck and wanted to get more
> > information.
> > >
> > > Approximately 10% of the reduce tasks show up as "Killed" in the UI.
> The
> > > logs say that the killed tasks are in the shuffle phase when they are
> > > killed
> > > but the logs don't show any exception.
> > > My understanding is that these killed tasks would be started again and
> > this
> > > slows down the whole hadoop job.
> > > I was wondering what the possible issues maybe and how to debug this
> > issue?
> > >
> > > I have tried on both the hadoop 0.20.2 and the latest version of hadoop
> > > from
> > > yahoo's github.
> > > I've monitored the nodes and there is a lot of free disk space and
> memory
> > > on
> > > all nodes (more than 1 TB free disk and 5 GB free memory at all times
> on
> > > all
> > > nodes).
> > >
> > > Since there are no exceptions and any other visible issues, I am
> finding
> > it
> > > hard to figure out what the problem might be. Could anybody help?
> > >
> > > Thanks,
> > > -aniket
> > >
> >
>

Re: Shuffle tasks getting killed

Reply via email to