Re: Samza job killed by left orphaned on YARN

David Yu Thu, 19 May 2016 14:34:08 -0700

Just stumbled upon this post and sees to be the same issue:

https://issues.apache.org/jira/browse/SAMZA-498



We followed the fix to create a wrapper kill script and everything works.

Do we have a plan to fix this in the next version of Samza?

Thanks,
David

On Wed, May 18, 2016 at 11:53 AM, Jacob Maes <jacob.m...@gmail.com> wrote:

> Hmm, could there be something in your job holding up the container shutdown
> process? Perhaps something ignoring SIGTERM/Thread.interrupt, by chance?
>
> Also, I think there's a YARN property specifying the amount of time the NM
> waits between sending a SIGTERM and a SIGKILL, though I can't find it at
> the moment.
>
> -Jake
>
> On Wed, May 18, 2016 at 10:32 AM, David Yu <david...@optimizely.com>
> wrote:
>
> > From the NM log, I'm seeing:
> >
> > 2016-05-18 06:29:06,248 INFO
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> > Cleaning up container
> container_e01_1463512986427_0007_01_0000022016-05-18
> > 06:29:06,265 INFO
> >
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> > Application *application_1463512986427_0007* transitioned from RUNNING to
> > FINISHING_CONTAINERS_WAIT
> >
> > (*Highlighted* is the particular samza application.)
> >
> > The status never transitioned from FINISHING_CONTAINERS_WAIT :(
> >
> >
> >
> > On Wed, May 18, 2016 at 10:21 AM, David Yu <david...@optimizely.com>
> > wrote:
> >
> > > Jacob,
> > >
> > > I have checked and made sure that NM is running on the node:
> > >
> > > $ ps aux | grep java
> > > ...
> > > yarn     25623  0.5  0.8 2366536 275488 ?      Sl   May17   7:04
> > > /usr/java/jdk1.8.0_51/bin/java -Dproc_nodemanager
> > >  ... org.apache.hadoop.yarn.server.nodemanager.NodeManager
> > >
> > >
> > >
> > > Thanks,
> > > David
> > >
> > > On Wed, May 18, 2016 at 7:08 AM, Jacob Maes <jacob.m...@gmail.com>
> > wrote:
> > >
> > >> Hey David,
> > >>
> > >> The only time I've seen orphaned containers is when the NM dies. If
> the
> > NM
> > >> isn't running, the RM has no means to kill the containers on a node.
> Can
> > >> you verify that the NM was healthy at the time of the shut down?
> > >>
> > >> If it wasn't healthy and/or it was restarted, one option that may help
> > is
> > >> NM Recovery:
> > >>
> > >>
> >
> https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html
> > >>
> > >> With NM Recovery, the NM will resume control over containers that were
> > >> running when the NM shut down. This option has virtually eliminated
> > >> orphaned containers in our clusters.
> > >>
> > >> -Jake
> > >>
> > >> On Tue, May 17, 2016 at 11:54 PM, David Yu <david...@optimizely.com>
> > >> wrote:
> > >>
> > >> > Samza version = 0.10.0
> > >> > YARN version = Hadoop 2.6.0-cdh5.4.9
> > >> >
> > >> > We are experience issues when killing a Samza job:
> > >> >
> > >> > $ yarn application -kill application_1463512986427_0007
> > >> >
> > >> > Killing application application_1463512986427_0007
> > >> >
> > >> > 16/05/18 06:29:05 INFO impl.YarnClientImpl: Killed application
> > >> > application_1463512986427_0007
> > >> >
> > >> > RM shows that the job is killed. However, the samza containers are
> > still
> > >> > left running.
> > >> >
> > >> > Any idea why this is happening?
> > >> >
> > >> > Thanks,
> > >> > David
> > >> >
> > >>
> > >
> > >
> >
>

Re: Samza job killed by left orphaned on YARN

Reply via email to