Sorry for the late response. Seems like you are hitting https://issues.apache.org/jira/browse/HADOOP-11989 on OSX as well.
If you have hadoop code base checked out (the exact version you have in your env), would you be able to apply the patch from this bug and test? By the way, which version of Slider are you using? How was your cluster installed? Vanilla or HDP? -Gour On 5/19/15, 4:49 PM, "Timothy Potter" <[email protected]> wrote: >Hi Gour, > >Thanks for your help! Here's the end of the slider.log. And yes, Solr >was running and in good health. > >2015-05-19 15:28:38,898 [2142179715@qtp-1716721394-4] INFO >agent.AgentProviderService - publishing >PublishedConfiguration{description='Servers' entries = 1} >2015-05-19 15:28:38,899 [2142179715@qtp-1716721394-4] INFO >agent.AgentProviderService - Component operation. Status: COMPLETED; >new container state: HEALTHY >2015-05-19 15:28:38,899 [AmExecutor-006] INFO >appmaster.SliderAppMaster - Registering component >container_1432005178704_0014_01_000002 >2015-05-19 15:28:38,899 [2142179715@qtp-1716721394-4] INFO >agent.AgentProviderService - Updating log and pwd folders for >container container_1432005178704_0014_01_000002 >2015-05-19 15:28:38,899 [2142179715@qtp-1716721394-4] INFO >agent.AgentProviderService - Updating log and pwd folders for >container container_1432005178704_0014_01_000002 >2015-05-19 15:28:38,899 [2142179715@qtp-1716721394-4] INFO >agent.AgentProviderService - Starting SOLR on >container_1432005178704_0014_01_000002. >2015-05-19 15:28:38,900 [AmExecutor-006] INFO >zk.RegistryOperationsService - Bound at >/users/timpotter/services/org-apache-slider/solr/components/container-1432 >005178704-0014-01-000002 >: ServiceRecord{description='SOLR'; external endpoints: {}; internal >endpoints: {}, attributes: {"yarn:persistence"="container" >"yarn:id"="container-1432005178704-0014-01-000002" }} >2015-05-19 15:28:40,858 [2142179715@qtp-1716721394-4] INFO >agent.AgentProviderService - Component operation. Status: IN_PROGRESS; >new container state: HEALTHY >2015-05-19 15:28:50,876 [2142179715@qtp-1716721394-4] INFO >agent.AgentProviderService - Component operation. Status: IN_PROGRESS; >new container state: HEALTHY >2015-05-19 15:28:51,026 [2142179715@qtp-1716721394-4] INFO >agent.AgentProviderService - Component operation. Status: COMPLETED; >new container state: HEALTHY > > >2015-05-19 15:35:30,949 [IPC Server handler 2 on 1024] INFO >api.SliderClusterProtocol - SliderAppMasterApi.stopCluster: stop >command issued: exit code = 0, SUCCEEDED: stop command issued; >2015-05-19 15:35:31,952 [AmExecutor-006] INFO >appmaster.SliderAppMaster - SliderAppMasterApi.stopCluster: stop >command issued >2015-05-19 15:35:31,953 [main] INFO appmaster.SliderAppMaster - >Triggering shutdown of the AM: stop command issued: exit code = 0, >SUCCEEDED: stop command issued; >2015-05-19 15:35:31,953 [main] INFO appmaster.SliderAppMaster - >Process has exited with exit code 0 mapped to 0 -ignoring >2015-05-19 15:35:31,953 [main] INFO workflow.WorkflowCompositeService >- Child service completed Service RoleLaunchService in state >RoleLaunchService: STOPPED >2015-05-19 15:35:31,953 [main] INFO state.AppState - Releasing 2 >containers >2015-05-19 15:35:31,953 [main] INFO state.AppState - Releasing >container. Log: >2015-05-19 15:35:31,953 [main] INFO appmaster.SliderAppMaster - >Application completed. Signalling finish to RM >2015-05-19 15:35:31,954 [main] INFO appmaster.SliderAppMaster - >Unregistering AM status=SUCCEEDED message=stop command issued >2015-05-19 15:35:31,957 [main] INFO impl.AMRMClientImpl - Waiting for >application to be successfully unregistered. >2015-05-19 15:35:32,059 [main] INFO appmaster.SliderAppMaster - >Exiting AM; final exit code = 0 >2015-05-19 15:35:32,060 [main] INFO util.ExitUtil - Exiting with status 0 >2015-05-19 15:35:32,061 [Shutdown] INFO mortbay.log - Shutdown hook >executing >2015-05-19 15:35:32,061 [Shutdown] INFO mortbay.log - Stopped >[email protected]:52672 >2015-05-19 15:35:32,063 [Shutdown] INFO mortbay.log - Stopped >[email protected]:52671 >2015-05-19 15:35:32,065 [Thread-0] INFO mortbay.log - Stopped >[email protected]:1025 >2015-05-19 15:35:32,165 [Shutdown] INFO mortbay.log - Shutdown hook >complete >2015-05-19 15:35:32,168 [Thread-0] INFO ipc.Server - Stopping server on >1024 >2015-05-19 15:35:32,168 [IPC Server listener on 1024] INFO ipc.Server >- Stopping IPC Server listener on 1024 >2015-05-19 15:35:32,168 [IPC Server Responder] INFO ipc.Server - >Stopping IPC Server Responder >2015-05-19 15:35:32,169 [Thread-0] INFO >impl.ContainerManagementProtocolProxy - Opening proxy : >192.168.1.3:52525 >2015-05-19 15:35:32,179 [AmExecutor-005] INFO actions.QueueService - >QueueService processor terminated >2015-05-19 15:35:32,179 [AmExecutor-006] WARN actions.ActionStopQueue - >STOP >2015-05-19 15:35:32,179 [AmExecutor-006] INFO actions.QueueExecutor - >Queue Executor run() stopped >2015-05-19 15:35:32,179 [AMRM Callback Handler Thread] INFO >impl.AMRMClientAsyncImpl - Interrupted while waiting for queue >java.lang.InterruptedException > at >java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.repo >rtInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > at >java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awai >t(AbstractQueuedSynchronizer.java:2052) > at >java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442 >) > at >org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackH >andlerThread.run(AMRMClientAsyncImpl.java:274) > >On Tue, May 19, 2015 at 5:37 PM, Gour Saha <[email protected]> wrote: >> Can you send me the Slider AM logs as well? It should be the logs in the >> container with id container_1432005178704_0014_01_000001. >> >> Also, >> Before you issued the stop command, was Solr up and running and in good >> health? >> >> -Gour >> >> On 5/19/15, 2:37 PM, "Timothy Potter" <[email protected]> wrote: >> >>>Using 0.72.0 build ... >>> >>>I deploy my app successfully, but when I try to stop it using: >>> >>>bin/slider stop solr >>> >>>It doesn't look like my stop python method is ever called and the >>>underlying Solr process is not stopped. In the slider-agent.log, I see >>>this: >>> >>>INFO 2015-05-19 15:35:41,571 security.py:132 - Encountered >>>communication error. Details: BadStatusLine("''",) >>>ERROR 2015-05-19 15:35:41,571 Controller.py:562 - Exception raised >>>Traceback (most recent call last): >>> File >>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache >>>/a >>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag >>>en >>>t/agent/Controller.py", >>>line 558, in sendRequest >>> File >>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache >>>/a >>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag >>>en >>>t/agent/security.py", >>>line 134, in request >>>IOError: Error occured during connecting to the server: '' >>>WARNING 2015-05-19 15:35:41,571 Controller.py:565 - Request failed! >>>Data: {"nodeStatus": {"status": "HEALTHY", "cause": "NONE"}, >>>"timestamp": 1432071341566, "hostname": >>>"container_1432005178704_0014_01_000002___SOLR", "responseId": 46, >>>"fqdn": "Lucids-MacBook-Pro.local", "reports": []} >>>ERROR 2015-05-19 15:35:45,575 Controller.py:374 - Unable to connect >>>to: >>>https://Lucids-MacBook-Pro.local:52672/ws/v1/slider/agents/container_143 >>>20 >>>05178704_0014_01_000002___SOLR/heartbeat >>>due to expected string or buffer >>>ERROR 2015-05-19 15:35:45,575 Controller.py:384 - Heartbeat retry count >>>= >>>1 >>>INFO 2015-05-19 15:35:55,584 security.py:89 - SSL Connect being >>>called.. connecting to the server >>>ERROR 2015-05-19 15:35:55,586 Controller.py:562 - Exception raised >>>Traceback (most recent call last): >>> File >>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache >>>/a >>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag >>>en >>>t/agent/Controller.py", >>>line 556, in sendRequest >>> File >>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache >>>/a >>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag >>>en >>>t/agent/security.py", >>>line 106, in __init__ >>> File >>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache >>>/a >>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag >>>en >>>t/agent/security.py", >>>line 111, in connect >>> File >>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache >>>/a >>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag >>>en >>>t/agent/security.py", >>>line 49, in connect >>> File >>>"/private/tmp/hadoop-timpotter/nm-local-dir/usercache/timpotter/appcache >>>/a >>>pplication_1432005178704_0014/filecache/66/slider-agent.tar.gz/slider-ag >>>en >>>t/agent/security.py", >>>line 90, in create_connection >>> File >>>"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ >>>so >>>cket.py", >>>line 571, in create_connection >>> raise err >>>error: [Errno 61] Connection refused >>>WARNING 2015-05-19 15:35:55,587 Controller.py:565 - Request failed! >>>Data: {"nodeStatus": {"status": "HEALTHY", "cause": "NONE"}, >>>"timestamp": 1432071341566, "hostname": >>>"container_1432005178704_0014_01_000002___SOLR", "responseId": 46, >>>"fqdn": "Lucids-MacBook-Pro.local", "reports": []} >>
