Just stumbled upon this post and sees to be the same issue: https://issues.apache.org/jira/browse/SAMZA-498
We followed the fix to create a wrapper kill script and everything works. Do we have a plan to fix this in the next version of Samza? Thanks, David On Wed, May 18, 2016 at 11:53 AM, Jacob Maes <jacob.m...@gmail.com> wrote: > Hmm, could there be something in your job holding up the container shutdown > process? Perhaps something ignoring SIGTERM/Thread.interrupt, by chance? > > Also, I think there's a YARN property specifying the amount of time the NM > waits between sending a SIGTERM and a SIGKILL, though I can't find it at > the moment. > > -Jake > > On Wed, May 18, 2016 at 10:32 AM, David Yu <david...@optimizely.com> > wrote: > > > From the NM log, I'm seeing: > > > > 2016-05-18 06:29:06,248 INFO > > > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > > Cleaning up container > container_e01_1463512986427_0007_01_0000022016-05-18 > > 06:29:06,265 INFO > > > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > > Application *application_1463512986427_0007* transitioned from RUNNING to > > FINISHING_CONTAINERS_WAIT > > > > (*Highlighted* is the particular samza application.) > > > > The status never transitioned from FINISHING_CONTAINERS_WAIT :( > > > > > > > > On Wed, May 18, 2016 at 10:21 AM, David Yu <david...@optimizely.com> > > wrote: > > > > > Jacob, > > > > > > I have checked and made sure that NM is running on the node: > > > > > > $ ps aux | grep java > > > ... > > > yarn 25623 0.5 0.8 2366536 275488 ? Sl May17 7:04 > > > /usr/java/jdk1.8.0_51/bin/java -Dproc_nodemanager > > > ... org.apache.hadoop.yarn.server.nodemanager.NodeManager > > > > > > > > > > > > Thanks, > > > David > > > > > > On Wed, May 18, 2016 at 7:08 AM, Jacob Maes <jacob.m...@gmail.com> > > wrote: > > > > > >> Hey David, > > >> > > >> The only time I've seen orphaned containers is when the NM dies. If > the > > NM > > >> isn't running, the RM has no means to kill the containers on a node. > Can > > >> you verify that the NM was healthy at the time of the shut down? > > >> > > >> If it wasn't healthy and/or it was restarted, one option that may help > > is > > >> NM Recovery: > > >> > > >> > > > https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html > > >> > > >> With NM Recovery, the NM will resume control over containers that were > > >> running when the NM shut down. This option has virtually eliminated > > >> orphaned containers in our clusters. > > >> > > >> -Jake > > >> > > >> On Tue, May 17, 2016 at 11:54 PM, David Yu <david...@optimizely.com> > > >> wrote: > > >> > > >> > Samza version = 0.10.0 > > >> > YARN version = Hadoop 2.6.0-cdh5.4.9 > > >> > > > >> > We are experience issues when killing a Samza job: > > >> > > > >> > $ yarn application -kill application_1463512986427_0007 > > >> > > > >> > Killing application application_1463512986427_0007 > > >> > > > >> > 16/05/18 06:29:05 INFO impl.YarnClientImpl: Killed application > > >> > application_1463512986427_0007 > > >> > > > >> > RM shows that the job is killed. However, the samza containers are > > still > > >> > left running. > > >> > > > >> > Any idea why this is happening? > > >> > > > >> > Thanks, > > >> > David > > >> > > > >> > > > > > > > > >