Thank you Sumit. On Sat, Mar 14, 2015 at 9:51 PM, Sumit Mohanty <[email protected]> wrote:
> This error is usually harmless - it happens when application is being > stopped (slider stop cl1) and Slider Agents may still be heartbeating with > the AppMaster. > > <snip> > > impl.AMRMClientAsyncImpl - Interrupted while waiting for queue > > java.lang.InterruptedException > > at > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > > at > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) > > at > > > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > > at > > > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274) > </snip> > > What is not implemented is an explicit call to "stop function in the > python scripts". > > What I was referring to that an attempt is made by the Agent to call stop > in the python script but it is not guaranteed. The reason it is not > guaranteed is that the call to stop() and kill of the containers by YARN is > not co-ordinated. > > In summary, the ability to call stop() functions in the python script is > not implemented. Its in the plan though. > > ________________________________________ > From: Ted Yu <[email protected]> > Sent: Saturday, March 14, 2015 8:52 AM > To: [email protected] > Subject: Re: Apache Slider stop function not working > > Kishore: > Looks like logging was at INFO level. > Do you mind turning on DEBUG logging ? > > Thanks > > On Sat, Mar 14, 2015 at 7:39 AM, Krishna Kishore Bonagiri < > [email protected]> wrote: > > > Hi Steve, > > > > This is what I see in the AM's log since the STOP command is issued. > Even > > though it indicates that STOP command SUCCEEDED, I see that the stop > > function in my python script is not getting executed. Does the exception > at > > the end of this log indicate something? > > > > 2015-03-14 07:24:01,202 [IPC Server handler 2 on 39387] INFO > > appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop > > command issued: exit code = 0, SUCCEEDED: stop command issued; > > 2015-03-14 07:24:02,202 [AmExecutor-006] INFO > > appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop > > command issued > > 2015-03-14 07:24:02,202 [main] INFO appmaster.SliderAppMaster - > > Triggering shutdown of the AM: stop command issued: exit code = 0, > > SUCCEEDED: stop command issued; > > 2015-03-14 07:24:02,202 [main] INFO appmaster.SliderAppMaster - > > Process has exited with exit code 0 mapped to 0 -ignoring > > 2015-03-14 07:24:02,202 [main] INFO workflow.WorkflowCompositeService > > - Child service completed Service RoleLaunchService in state > > RoleLaunchService: STOPPED > > 2015-03-14 07:24:02,202 [main] INFO state.AppState - Releasing 2 > > containers > > 2015-03-14 07:24:02,203 [main] INFO state.AppState - Releasing > > container. Log: > > > > > http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1395.svl.ibm.com:45454/container_1425452295813_0123_01_000002/ctx/bigsql > > 2015-03-14 07:24:02,203 [main] INFO state.AppState - Releasing > > container. Log: > > > > > http://bdvs1395.svl.ibm.com:19888/jobhistory/logs/bdvs1396.svl.ibm.com:45454/container_1425452295813_0123_01_000003/ctx/bigsql > > 2015-03-14 07:24:02,204 [main] INFO appmaster.SliderAppMaster - > > Application completed. Signalling finish to RM > > 2015-03-14 07:24:02,204 [main] INFO appmaster.SliderAppMaster - > > Unregistering AM status=SUCCEEDED message=stop command issued > > 2015-03-14 07:24:02,209 [main] INFO impl.AMRMClientImpl - Waiting for > > application to be successfully unregistered. > > 2015-03-14 07:24:02,310 [main] INFO appmaster.SliderAppMaster - > > Exiting AM; final exit code = 0 > > 2015-03-14 07:24:02,312 [main] INFO util.ExitUtil - Exiting with status > 0 > > 2015-03-14 07:24:02,326 [Shutdown] INFO mortbay.log - Shutdown hook > > executing > > 2015-03-14 07:24:02,343 [Shutdown] INFO mortbay.log - Stopped > > [email protected]:45840 > > 2015-03-14 07:24:02,354 [Thread-1] INFO mortbay.log - Stopped > > [email protected]:0 > > 2015-03-14 07:24:02,355 [Shutdown] INFO mortbay.log - Stopped > > [email protected]:48056 > > 2015-03-14 07:24:02,358 [Shutdown] INFO mortbay.log - Shutdown hook > > complete > > 2015-03-14 07:24:02,364 [Thread-1] INFO ipc.Server - Stopping server on > > 39387 > > 2015-03-14 07:24:02,365 [IPC Server listener on 39387] INFO > > ipc.Server - Stopping IPC Server listener on 39387 > > 2015-03-14 07:24:02,366 [IPC Server Responder] INFO ipc.Server - > > Stopping IPC Server Responder > > 2015-03-14 07:24:02,367 [Thread-1] INFO > > impl.ContainerManagementProtocolProxy - Opening proxy : > > bdvs1395.svl.ibm.com:45454 > > 2015-03-14 07:24:02,383 [Thread-1] INFO > > impl.ContainerManagementProtocolProxy - Opening proxy : > > bdvs1396.svl.ibm.com:45454 > > 2015-03-14 07:24:02,429 [AMRM Callback Handler Thread] INFO > > impl.AMRMClientAsyncImpl - Interrupted while waiting for queue > > java.lang.InterruptedException > > at > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > > at > > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) > > at > > > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > > at > > > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274) > > 2015-03-14 07:24:02,432 [AmExecutor-005] INFO actions.QueueService - > > QueueService processor terminated > > 2015-03-14 07:24:02,432 [AmExecutor-006] WARN actions.ActionStopQueue - > > STOP > > 2015-03-14 07:24:02,432 [AmExecutor-006] INFO actions.QueueExecutor - > > Queue Executor run() stopped > > > > > > Thanks, > > > > Kishore > > > > > > > > On Sat, Mar 14, 2015 at 7:28 PM, Steve Loughran <[email protected]> > > wrote: > > > > > > > > Sorry, I think we've been creating confusion > > > > > > Sumit was referring to the fact that in the app-specific python scripts > > > inside an app package, there's a stop operation which isn't > implemented; > > > the specific component instances currently get destroyed without > warning > > > when the slider AM hands back the containers to YARN. > > > > > > The CLI "stop" operation is very much supported, and it should work. > > > > > > 1. The basic "slider stop cl1" operation is meant to find the running > > > application and ask it to shut down. If this doesn't work, can we see > (a) > > > any stack trace on the client and (b) the tail end of the AM logs. > > > > > > 2. "slider stop cl1 --force" skips talking to the slider AM and talks > to > > > YARN direct. No matter what's going on inside the application, this > will > > > kill it. If it doesn't, there's something gone wrong on the client side > > > about talking to YARN, or something very very wrong with the YARN > system > > > itself. Again, a client-side log will help us review this > > > > > > -steve > > > > > > > > > > On 14 Mar 2015, at 07:09, Krishna Kishore Bonagiri < > > > [email protected]> wrote: > > > > > > > > Hi Sumit, > > > > First of all thanks for the reply. > > > > > > > > What we have been trying is this kind of command from CLI. > > > > slider stop cl1 > > > > > > > > So, as you are saying it doesn't yet work. But what is the other way > > to > > > > stop the application? What do you mean by "The only time stop is > > called, > > > > today, is when the application is stopped the Slider Agents call > Stop"? > > > > > > > > Kishore > > > > > > > > On Sat, Mar 14, 2015 at 10:56 AM, Sumit Mohanty < > > [email protected] > > > > > > > > wrote: > > > > > > > >> Stop is not wired up to the Stop command from the CLI. The only time > > > stop > > > >> is called, today, is when the application is stopped the Slider > Agents > > > call > > > >> Stop and wait for ~10 seconds before killing the processes. > > > >> > > > >> On Fri, Mar 13, 2015 at 8:05 PM, Krishna Kishore Bonagiri < > > > >> [email protected]> wrote: > > > >> > > > >>> Hi, > > > >>> > > > >>> We are using Apache Slider 0.60 and implemented the management > > > >> operations > > > >>> start, status, stop, etc. in python script. Everything else is > > working > > > >> but > > > >>> the stop function is not getting invoked when the container is > > stopped. > > > >> Is > > > >>> this a known issue already? or is there any trick to make it work? > > > >>> > > > >>> > > > >>> Thanks, > > > >>> Kishore > > > >>> > > > >> > > > >> > > > >> > > > >> -- > > > >> thanks > > > >> Sumit > > > >> > > > > > > > > >
