Hi Steve, This is what I see in the AM's log since the STOP command is issued. Even though it indicates that STOP command SUCCEEDED, I see that the stop function in my python script is not getting executed. Does the exception at the end of this log indicate something?
2015-03-14 07:24:01,202 [IPC Server handler 2 on 39387] INFO appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop command issued: exit code = 0, SUCCEEDED: stop command issued; 2015-03-14 07:24:02,202 [AmExecutor-006] INFO appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop command issued 2015-03-14 07:24:02,202 [main] INFO appmaster.SliderAppMaster - Triggering shutdown of the AM: stop command issued: exit code = 0, SUCCEEDED: stop command issued; 2015-03-14 07:24:02,202 [main] INFO appmaster.SliderAppMaster - Process has exited with exit code 0 mapped to 0 -ignoring 2015-03-14 07:24:02,202 [main] INFO workflow.WorkflowCompositeService - Child service completed Service RoleLaunchService in state RoleLaunchService: STOPPED 2015-03-14 07:24:02,202 [main] INFO state.AppState - Releasing 2 containers 2015-03-14 07:24:02,203 [main] INFO state.AppState - Releasing container. Log: http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1395.svl.ibm.com:45454/container_1425452295813_0123_01_000002/ctx/bigsql 2015-03-14 07:24:02,203 [main] INFO state.AppState - Releasing container. Log: http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1396.svl.ibm.com:45454/container_1425452295813_0123_01_000003/ctx/bigsql 2015-03-14 07:24:02,204 [main] INFO appmaster.SliderAppMaster - Application completed. Signalling finish to RM 2015-03-14 07:24:02,204 [main] INFO appmaster.SliderAppMaster - Unregistering AM status=SUCCEEDED message=stop command issued 2015-03-14 07:24:02,209 [main] INFO impl.AMRMClientImpl - Waiting for application to be successfully unregistered. 2015-03-14 07:24:02,310 [main] INFO appmaster.SliderAppMaster - Exiting AM; final exit code = 0 2015-03-14 07:24:02,312 [main] INFO util.ExitUtil - Exiting with status 0 2015-03-14 07:24:02,326 [Shutdown] INFO mortbay.log - Shutdown hook executing 2015-03-14 07:24:02,343 [Shutdown] INFO mortbay.log - Stopped [email protected]:45840 2015-03-14 07:24:02,354 [Thread-1] INFO mortbay.log - Stopped [email protected]:0 2015-03-14 07:24:02,355 [Shutdown] INFO mortbay.log - Stopped [email protected]:48056 2015-03-14 07:24:02,358 [Shutdown] INFO mortbay.log - Shutdown hook complete 2015-03-14 07:24:02,364 [Thread-1] INFO ipc.Server - Stopping server on 39387 2015-03-14 07:24:02,365 [IPC Server listener on 39387] INFO ipc.Server - Stopping IPC Server listener on 39387 2015-03-14 07:24:02,366 [IPC Server Responder] INFO ipc.Server - Stopping IPC Server Responder 2015-03-14 07:24:02,367 [Thread-1] INFO impl.ContainerManagementProtocolProxy - Opening proxy : bdvs1395.svl.ibm.com:45454 2015-03-14 07:24:02,383 [Thread-1] INFO impl.ContainerManagementProtocolProxy - Opening proxy : bdvs1396.svl.ibm.com:45454 2015-03-14 07:24:02,429 [AMRM Callback Handler Thread] INFO impl.AMRMClientAsyncImpl - Interrupted while waiting for queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274) 2015-03-14 07:24:02,432 [AmExecutor-005] INFO actions.QueueService - QueueService processor terminated 2015-03-14 07:24:02,432 [AmExecutor-006] WARN actions.ActionStopQueue - STOP 2015-03-14 07:24:02,432 [AmExecutor-006] INFO actions.QueueExecutor - Queue Executor run() stopped Thanks, Kishore On Sat, Mar 14, 2015 at 7:28 PM, Steve Loughran <[email protected]> wrote: > > Sorry, I think we've been creating confusion > > Sumit was referring to the fact that in the app-specific python scripts > inside an app package, there's a stop operation which isn't implemented; > the specific component instances currently get destroyed without warning > when the slider AM hands back the containers to YARN. > > The CLI "stop" operation is very much supported, and it should work. > > 1. The basic "slider stop cl1" operation is meant to find the running > application and ask it to shut down. If this doesn't work, can we see (a) > any stack trace on the client and (b) the tail end of the AM logs. > > 2. "slider stop cl1 --force" skips talking to the slider AM and talks to > YARN direct. No matter what's going on inside the application, this will > kill it. If it doesn't, there's something gone wrong on the client side > about talking to YARN, or something very very wrong with the YARN system > itself. Again, a client-side log will help us review this > > -steve > > > > On 14 Mar 2015, at 07:09, Krishna Kishore Bonagiri < > [email protected]> wrote: > > > > Hi Sumit, > > First of all thanks for the reply. > > > > What we have been trying is this kind of command from CLI. > > slider stop cl1 > > > > So, as you are saying it doesn't yet work. But what is the other way to > > stop the application? What do you mean by "The only time stop is called, > > today, is when the application is stopped the Slider Agents call Stop"? > > > > Kishore > > > > On Sat, Mar 14, 2015 at 10:56 AM, Sumit Mohanty <[email protected] > > > > wrote: > > > >> Stop is not wired up to the Stop command from the CLI. The only time > stop > >> is called, today, is when the application is stopped the Slider Agents > call > >> Stop and wait for ~10 seconds before killing the processes. > >> > >> On Fri, Mar 13, 2015 at 8:05 PM, Krishna Kishore Bonagiri < > >> [email protected]> wrote: > >> > >>> Hi, > >>> > >>> We are using Apache Slider 0.60 and implemented the management > >> operations > >>> start, status, stop, etc. in python script. Everything else is working > >> but > >>> the stop function is not getting invoked when the container is stopped. > >> Is > >>> this a known issue already? or is there any trick to make it work? > >>> > >>> > >>> Thanks, > >>> Kishore > >>> > >> > >> > >> > >> -- > >> thanks > >> Sumit > >> > >
