Re: Jobs are still in running state after executing "hadoop job -kill jobId"

Edward Capriolo Tue, 05 Jul 2011 10:30:06 -0700

On Tue, Jul 5, 2011 at 11:45 AM, Juwei Shi <[email protected]> wrote:


> We sometimes have hundreds of map or reduce tasks for a job. I think it is
> hard to find all of them and kill the corresponding jvm processes. If we do
> not want to restart hadoop, is there any automatic methods?
>
> 2011/7/5 <[email protected]>
>
> > Um kill  -9 "pid" ?
> >
> > -----Original Message-----
> > From: Juwei Shi [mailto:[email protected]]
> > Sent: Friday, July 01, 2011 10:53 AM
> > To: [email protected]; [email protected]
> > Subject: Jobs are still in running state after executing "hadoop job
> > -kill jobId"
> >
> > Hi,
> >
> > I faced a problem that the jobs are still running after executing
> > "hadoop
> > job -kill jobId". I rebooted the cluster but the job still can not be
> > killed.
> >
> > The hadoop version is 0.20.2.
> >
> > Any idea?
> >
> > Thanks in advance!
> >
> > --
> > - Juwei
> >
> >
>

I do not think they pop up very often but after days and months of running a
orphans can be alive. The way I would handle it is write a check that runs
over Nagios (NRPE) and looks for Hadoop task processes using ps, that are
older then a certain age such as 1 day or 1 week etc. Then you can decide if
want nagios to terminate these orphans or do it by hand.

Edward

Re: Jobs are still in running state after executing "hadoop job -kill jobId"

Reply via email to