This error is usually harmless - it happens when application is being stopped (slider stop cl1) and Slider Agents may still be heartbeating with the AppMaster.
<snip> > impl.AMRMClientAsyncImpl - Interrupted while waiting for queue > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274) </snip> What is not implemented is an explicit call to "stop function in the python scripts". What I was referring to that an attempt is made by the Agent to call stop in the python script but it is not guaranteed. The reason it is not guaranteed is that the call to stop() and kill of the containers by YARN is not co-ordinated. In summary, the ability to call stop() functions in the python script is not implemented. Its in the plan though. ________________________________________ From: Ted Yu <[email protected]> Sent: Saturday, March 14, 2015 8:52 AM To: [email protected] Subject: Re: Apache Slider stop function not working Kishore: Looks like logging was at INFO level. Do you mind turning on DEBUG logging ? Thanks On Sat, Mar 14, 2015 at 7:39 AM, Krishna Kishore Bonagiri < [email protected]> wrote: > Hi Steve, > > This is what I see in the AM's log since the STOP command is issued. Even > though it indicates that STOP command SUCCEEDED, I see that the stop > function in my python script is not getting executed. Does the exception at > the end of this log indicate something? > > 2015-03-14 07:24:01,202 [IPC Server handler 2 on 39387] INFO > appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop > command issued: exit code = 0, SUCCEEDED: stop command issued; > 2015-03-14 07:24:02,202 [AmExecutor-006] INFO > appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop > command issued > 2015-03-14 07:24:02,202 [main] INFO appmaster.SliderAppMaster - > Triggering shutdown of the AM: stop command issued: exit code = 0, > SUCCEEDED: stop command issued; > 2015-03-14 07:24:02,202 [main] INFO appmaster.SliderAppMaster - > Process has exited with exit code 0 mapped to 0 -ignoring > 2015-03-14 07:24:02,202 [main] INFO workflow.WorkflowCompositeService > - Child service completed Service RoleLaunchService in state > RoleLaunchService: STOPPED > 2015-03-14 07:24:02,202 [main] INFO state.AppState - Releasing 2 > containers > 2015-03-14 07:24:02,203 [main] INFO state.AppState - Releasing > container. Log: > > http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1395.svl.ibm.com:45454/container_1425452295813_0123_01_000002/ctx/bigsql > 2015-03-14 07:24:02,203 [main] INFO state.AppState - Releasing > container. Log: > > http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1396.svl.ibm.com:45454/container_1425452295813_0123_01_000003/ctx/bigsql > 2015-03-14 07:24:02,204 [main] INFO appmaster.SliderAppMaster - > Application completed. Signalling finish to RM > 2015-03-14 07:24:02,204 [main] INFO appmaster.SliderAppMaster - > Unregistering AM status=SUCCEEDED message=stop command issued > 2015-03-14 07:24:02,209 [main] INFO impl.AMRMClientImpl - Waiting for > application to be successfully unregistered. > 2015-03-14 07:24:02,310 [main] INFO appmaster.SliderAppMaster - > Exiting AM; final exit code = 0 > 2015-03-14 07:24:02,312 [main] INFO util.ExitUtil - Exiting with status 0 > 2015-03-14 07:24:02,326 [Shutdown] INFO mortbay.log - Shutdown hook > executing > 2015-03-14 07:24:02,343 [Shutdown] INFO mortbay.log - Stopped > [email protected]:45840 > 2015-03-14 07:24:02,354 [Thread-1] INFO mortbay.log - Stopped > [email protected]:0 > 2015-03-14 07:24:02,355 [Shutdown] INFO mortbay.log - Stopped > [email protected]:48056 > 2015-03-14 07:24:02,358 [Shutdown] INFO mortbay.log - Shutdown hook > complete > 2015-03-14 07:24:02,364 [Thread-1] INFO ipc.Server - Stopping server on > 39387 > 2015-03-14 07:24:02,365 [IPC Server listener on 39387] INFO > ipc.Server - Stopping IPC Server listener on 39387 > 2015-03-14 07:24:02,366 [IPC Server Responder] INFO ipc.Server - > Stopping IPC Server Responder > 2015-03-14 07:24:02,367 [Thread-1] INFO > impl.ContainerManagementProtocolProxy - Opening proxy : > bdvs1395.svl.ibm.com:45454 > 2015-03-14 07:24:02,383 [Thread-1] INFO > impl.ContainerManagementProtocolProxy - Opening proxy : > bdvs1396.svl.ibm.com:45454 > 2015-03-14 07:24:02,429 [AMRM Callback Handler Thread] INFO > impl.AMRMClientAsyncImpl - Interrupted while waiting for queue > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274) > 2015-03-14 07:24:02,432 [AmExecutor-005] INFO actions.QueueService - > QueueService processor terminated > 2015-03-14 07:24:02,432 [AmExecutor-006] WARN actions.ActionStopQueue - > STOP > 2015-03-14 07:24:02,432 [AmExecutor-006] INFO actions.QueueExecutor - > Queue Executor run() stopped > > > Thanks, > > Kishore > > > > On Sat, Mar 14, 2015 at 7:28 PM, Steve Loughran <[email protected]> > wrote: > > > > > Sorry, I think we've been creating confusion > > > > Sumit was referring to the fact that in the app-specific python scripts > > inside an app package, there's a stop operation which isn't implemented; > > the specific component instances currently get destroyed without warning > > when the slider AM hands back the containers to YARN. > > > > The CLI "stop" operation is very much supported, and it should work. > > > > 1. The basic "slider stop cl1" operation is meant to find the running > > application and ask it to shut down. If this doesn't work, can we see (a) > > any stack trace on the client and (b) the tail end of the AM logs. > > > > 2. "slider stop cl1 --force" skips talking to the slider AM and talks to > > YARN direct. No matter what's going on inside the application, this will > > kill it. If it doesn't, there's something gone wrong on the client side > > about talking to YARN, or something very very wrong with the YARN system > > itself. Again, a client-side log will help us review this > > > > -steve > > > > > > > On 14 Mar 2015, at 07:09, Krishna Kishore Bonagiri < > > [email protected]> wrote: > > > > > > Hi Sumit, > > > First of all thanks for the reply. > > > > > > What we have been trying is this kind of command from CLI. > > > slider stop cl1 > > > > > > So, as you are saying it doesn't yet work. But what is the other way > to > > > stop the application? What do you mean by "The only time stop is > called, > > > today, is when the application is stopped the Slider Agents call Stop"? > > > > > > Kishore > > > > > > On Sat, Mar 14, 2015 at 10:56 AM, Sumit Mohanty < > [email protected] > > > > > > wrote: > > > > > >> Stop is not wired up to the Stop command from the CLI. The only time > > stop > > >> is called, today, is when the application is stopped the Slider Agents > > call > > >> Stop and wait for ~10 seconds before killing the processes. > > >> > > >> On Fri, Mar 13, 2015 at 8:05 PM, Krishna Kishore Bonagiri < > > >> [email protected]> wrote: > > >> > > >>> Hi, > > >>> > > >>> We are using Apache Slider 0.60 and implemented the management > > >> operations > > >>> start, status, stop, etc. in python script. Everything else is > working > > >> but > > >>> the stop function is not getting invoked when the container is > stopped. > > >> Is > > >>> this a known issue already? or is there any trick to make it work? > > >>> > > >>> > > >>> Thanks, > > >>> Kishore > > >>> > > >> > > >> > > >> > > >> -- > > >> thanks > > >> Sumit > > >> > > > > >
