We have been seeing jobs not stopping after executing ' yarn application --kill 
<id>'. The yarn master looks to go through the effort of killing the job. The 
job is no longer listed via 'yarn application --list'. The job however 
continues to process messages. We have to chase down the rogue process and kill 
it.  Once we enter this state, we also have problems with the scripts in 
hadoop/sbin. Stop yarn finds no resource managers. We have to manually 'ps' and 
kill resource manager processes.


We have seen this problem several times now in production. This is causing us 
to to process messages when we think processing is stopped. Has anyone seen 
this or have any insight ?


Yarn: 2.6

Samza: 0.10.1


We do see the stack trace below in the Application Manager after killing the 
job in question:


2017-01-23 18:30:26 SamzaAppMaster$ [WARN] Listener 
org.apache.samza.job.yarn.SamzaAppMasterLifecycle@9fc9f91 failed to shutdown.
org.apache.hadoop.security.token.SecretManager$InvalidToken: 
appattempt_1481305736678_0002_000001 not found in AMRMTokenSecretManager.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
        at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:94)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy16.finishApplicationMaster(Unknown Source)
        at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.unregisterApplicationMaster(AMRMClientImpl.java:378)
        at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.unregisterApplicationMaster(AMRMClientAsyncImpl.java:157)
        at 
org.apache.samza.job.yarn.SamzaAppMasterLifecycle.onShutdown(SamzaAppMasterLifecycle.scala:63)
        at 
org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$5.apply(SamzaAppMaster.scala:145)
        at 
org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$5.apply(SamzaAppMaster.scala:144)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.samza.job.yarn.SamzaAppMaster$.run(SamzaAppMaster.scala:144)
        at 
org.apache.samza.job.yarn.SamzaAppMaster$.main(SamzaAppMaster.scala:112)
        at org.apache.samza.job.yarn.SamzaAppMaster.main(SamzaAppMaster.scala)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 appattempt_1481305736678_0002_000001 not found in AMRMTokenSecretManager.
        at org.apache.hadoop.ipc.Client.call(Client.java:1469)
        at org.apache.hadoop.ipc.Client.call(Client.java:1400)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy15.finishApplicationMaster(Unknown Source)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:91)
        ... 16 more






Jeremiah Adams
Software Engineer
www.helixeducation.com<http://www.helixeducation.com/>
Blog<http://www.helixeducation.com/blog/> | 
Twitter<https://twitter.com/HelixEducation> | 
Facebook<https://www.facebook.com/HelixEducation> | 
LinkedIn<http://www.linkedin.com/company/3609946>

Reply via email to