On Tue, Jul 5, 2011 at 11:45 AM, Juwei Shi <[email protected]> wrote:
> We sometimes have hundreds of map or reduce tasks for a job. I think it is > hard to find all of them and kill the corresponding jvm processes. If we do > not want to restart hadoop, is there any automatic methods? > > 2011/7/5 <[email protected]> > > > Um kill -9 "pid" ? > > > > -----Original Message----- > > From: Juwei Shi [mailto:[email protected]] > > Sent: Friday, July 01, 2011 10:53 AM > > To: [email protected]; [email protected] > > Subject: Jobs are still in running state after executing "hadoop job > > -kill jobId" > > > > Hi, > > > > I faced a problem that the jobs are still running after executing > > "hadoop > > job -kill jobId". I rebooted the cluster but the job still can not be > > killed. > > > > The hadoop version is 0.20.2. > > > > Any idea? > > > > Thanks in advance! > > > > -- > > - Juwei > > > > > I do not think they pop up very often but after days and months of running a orphans can be alive. The way I would handle it is write a check that runs over Nagios (NRPE) and looks for Hadoop task processes using ps, that are older then a certain age such as 1 day or 1 week etc. Then you can decide if want nagios to terminate these orphans or do it by hand. Edward
