And how do we confirm that slider agents are stopped in each node where the container is allocated? because even after stop command and even destroy command, I could see agents seems to be running in all those nodes.
yarn 47909 47907 0 00:37 ? 00:00:00 /bin/bash -c python ./infra/agent/slider-agent/agent/main.py --label container_1428575950531_0013_01_000002___NIMBUS --zk-quorum host1:2181,host2:2181,host3:2181 --zk-reg-path /registry/users/yarn/services/org-apache-slider/storm1 > /var/log/hadoop-yarn/application_1428575950531_0013/container_1428575950531_0013_01_000002/slider-agent.out 2>&1 yarn 47915 47909 0 00:37 ? 00:00:02 python ./infra/agent/slider-agent/agent/main.py --label container_1428575950531_0013_01_000002___NIMBUS --zk-quorum host1:2181,host2:2181,host3:2181 --zk-reg-path /registry/users/yarn/services/org-apache-slider/storm1 Doesn't these processes correspond to slider agent? On Tue, Apr 28, 2015 at 1:32 AM, Chackravarthy Esakkimuthu < [email protected]> wrote: > 1) slider create storm1 > --- it started all the components, SliderAM, slider agents. And storm UI > was accessible. Also manually logged into each host and verified all > components are up and running. > > 2) slider stop storm1 > --- it stopped SliderAM > --- but all the components are running along with slider agents. And storm > UI was accessible. > > 3) slider start storm1 (RM UI was less responsive during this time) > --- it started another sliderAM and other set of storm components and > slider agents also. And able to access storm UI in another host. > > So now, actually two storm cluster is running though I used same name > "storm1" > > On Tue, Apr 28, 2015 at 1:23 AM, Gour Saha <[email protected]> wrote: > >> Hmm.. Interesting. >> >> Is it possible to run "ps -ef | grep storm" before and after the storm1 >> app is started and send the output? >> >> -Gour >> >> On 4/27/15, 12:48 PM, "Chackravarthy Esakkimuthu" <[email protected]> >> wrote: >> >> >No, the processes are not old one, because it shows the class path which >> >has folder names corresponds to newly launched application id. (also >> every >> >time before launching new application, I made sure that all processes are >> >killed) >> > >> >And the output of list command as follows : >> > >> >sudo -u yarn /usr/hdp/current/slider-client/bin/./slider list >> >2015-04-28 01:14:24,568 [main] INFO impl.TimelineClientImpl - Timeline >> >service address: http://host2:8188/ws/v1/timeline/ >> >2015-04-28 01:14:25,669 [main] INFO client.RMProxy - Connecting to >> >ResourceManager at host2/XX.XX.XX.XX:8050 >> >storm1 FINISHED >> application_1428575950531_0013 >> > >> >2015-04-28 01:14:26,108 [main] INFO util.ExitUtil - Exiting with status >> 0 >> > >> >On Tue, Apr 28, 2015 at 1:01 AM, Gour Saha <[email protected]> >> wrote: >> > >> >> Sorry, forgot that --containers is supported in develop branch only. >> >>Just >> >> run list without that option. >> >> >> >> Seems like the running processes are stray processes from old >> >>experimental >> >> runs. Can you check the date/time of these processes? >> >> >> >> If you bring the storm instance up again, do you see new instances of >> >> nimbus, supervisor, etc. getting created? The old stray ones will >> >>probably >> >> still be there. >> >> >> >> Also, can you run just “slider list” (no other params) and send the >> >>output? >> >> >> >> -Gour >> >> >> >> On 4/27/15, 12:20 PM, "Chackravarthy Esakkimuthu" >> >><[email protected]> >> >> wrote: >> >> >> >> >There is some issue in that command usage (i tried giving the params >> in >> >> >the >> >> >the order also) >> >> > >> >> >sudo -u yarn /usr/hdp/current/slider-client/bin/./slider list storm1 >> >> >--containers >> >> > >> >> >2015-04-28 00:42:01,017 [main] ERROR main.ServiceLauncher - >> >> >com.beust.jcommander.ParameterException: Unknown option: --containers >> >>in >> >> >list storm1 --containers >> >> > >> >> >2015-04-28 00:42:01,021 [main] INFO util.ExitUtil - Exiting with >> >>status >> >> >40 >> >> > >> >> >Anyway, I issued STOP command and checked in the RM UI, the >> >>application is >> >> >stopped and all the 5 containers are released.. It shows as ZERO >> >> >containers >> >> >is running. >> >> > >> >> >But, when I login to that machine, I could see storm components are >> >>still >> >> >running there (ps -ef | grep storm). The processes are up. Even Storm >> >>UI >> >> >is >> >> >still accessible. >> >> > >> >> > >> >> > >> >> >On Tue, Apr 28, 2015 at 12:29 AM, Gour Saha <[email protected]> >> >> wrote: >> >> > >> >> >> Calling ³slider stop² before ³slider destroy² is the right order. >> >> >> >> >> >> On calling stop, your storm cluster should be completely stopped >> >> >> (including Slider AM and all storm components). >> >> >> >> >> >> Can you run this command after stop and send the output (don¹t run >> >> >>destroy >> >> >> yet)? >> >> >> >> >> >> slider list <app-instance-name> --containers >> >> >> >> >> >> Also, at this point you should check the RM UI and it should show >> >>that >> >> >>the >> >> >> yarn app is in stopped state. >> >> >> >> >> >> -Gour >> >> >> >> >> >> On 4/27/15, 11:52 AM, "Chackravarthy Esakkimuthu" >> >> >><[email protected]> >> >> >> wrote: >> >> >> >> >> >> >I started the storm on yarn (slider create) >> >> >> >Then wanted to test whether destroying the storm works or not. >> >> >> >So I tried in the following order : >> >> >> > >> >> >> >1) slider stop <app-instance-name> >> >> >> >-- in this case, sliderAM alone stopped, and all the other storm >> >> >>daemons >> >> >> >like Nimbus, supervisor, log_viewer, drpc, UI_Server was running. >> >> >>(along >> >> >> >with slider agents) >> >> >> > >> >> >> >Is this just an intermediate state before issuing destroy command? >> >> >> > >> >> >> >2) slider destroy <app-instance-name> >> >> >> >-- in this case, only nimbus and supervisor got killed. The other >> >>storm >> >> >> >daemons (log_viewer, drpc, UI_Server) still running. And slider >> >>agents >> >> >> >too >> >> >> >still running in all the 4 containers. >> >> >> > >> >> >> >This issue I face in 0.60 release. Then I tried with 0.71 release. >> >>But >> >> >> >still same behaviour exists. >> >> >> > >> >> >> >Am I using the command in wrong way (or some other order) ? or >> issue >> >> >> >exists. >> >> >> > >> >> >> >Thanks in advance! >> >> >> > >> >> >> > >> >> >> >Thanks, >> >> >> >Chackra >> >> >> >> >> >> >> >> >> >> >> >> >
