To dig deeper, I would need to get hold of the Slider AM log (slider.log) and at least one of the agent logs (slider-agent.log) for Nimbus say.
They will be under - /hadoop/yarn/log/<app_id>/<container_id>/ OR you can run - yarn logs -applicationId <app_id> and dump it in a file, if the <app_id> directory under /hadoop/yarn/log is missing. Also if you could provide the Node Manager logs it would help. It is under - /var/log/hadoop-yarn/yarn/ and file name of the format - yarn-yarn-nodemanager-<hostname>.log -Gour On 4/27/15, 1:32 PM, "Chackravarthy Esakkimuthu" <[email protected]> wrote: >Run “slider start storm1” again, it should create >application_1428575950531_0014 >(with id 0014). > ---> yes it does > >After that can you check if the processes from >application_1428575950531_0013 are still running? > ---> yes > >If yes, then run “slider stop storm1” again and then do you see processes >from >both application_1428575950531_0013 and application_1428575950531_0014 >running? > ---> yes both are running and able to access both storm UI's also. >(only >SliderAM was stopped) > >On Tue, Apr 28, 2015 at 1:54 AM, Gour Saha <[email protected]> wrote: > >> Yes, those processes correspond to slider agent. >> >> Based on the issue you are facing let’s do this - >> >> Run “slider start storm1” again, it should create >> application_1428575950531_0014 (with id 0014). After that can you check >>if >> the processes from application_1428575950531_0013 are still running? If >> yes, then run “slider stop storm1” again and then do you see processes >> from both application_1428575950531_0013 and >> application_1428575950531_0014 running? >> >> -Gour >> >> On 4/27/15, 1:11 PM, "Chackravarthy Esakkimuthu" <[email protected]> >> wrote: >> >> >And how do we confirm that slider agents are stopped in each node where >> >the >> >container is allocated? >> >because even after stop command and even destroy command, I could see >> >agents seems to be running in all those nodes. >> > >> >yarn 47909 47907 0 00:37 ? 00:00:00 /bin/bash -c python >> >./infra/agent/slider-agent/agent/main.py --label >> >container_1428575950531_0013_01_000002___NIMBUS --zk-quorum >> >host1:2181,host2:2181,host3:2181 --zk-reg-path >> >/registry/users/yarn/services/org-apache-slider/storm1 > >> >>>/var/log/hadoop-yarn/application_1428575950531_0013/container_1428575950 >>>53 >> >1_0013_01_000002/slider-agent.out >> >2>&1 >> >yarn 47915 47909 0 00:37 ? 00:00:02 python >> >./infra/agent/slider-agent/agent/main.py --label >> >container_1428575950531_0013_01_000002___NIMBUS --zk-quorum >> >host1:2181,host2:2181,host3:2181 --zk-reg-path >> >/registry/users/yarn/services/org-apache-slider/storm1 >> > >> >Doesn't these processes correspond to slider agent? >> > >> >On Tue, Apr 28, 2015 at 1:32 AM, Chackravarthy Esakkimuthu < >> >[email protected]> wrote: >> > >> >> 1) slider create storm1 >> >> --- it started all the components, SliderAM, slider agents. And >>storm UI >> >> was accessible. Also manually logged into each host and verified all >> >> components are up and running. >> >> >> >> 2) slider stop storm1 >> >> --- it stopped SliderAM >> >> --- but all the components are running along with slider agents. And >> >>storm >> >> UI was accessible. >> >> >> >> 3) slider start storm1 (RM UI was less responsive during this time) >> >> --- it started another sliderAM and other set of storm components and >> >> slider agents also. And able to access storm UI in another host. >> >> >> >> So now, actually two storm cluster is running though I used same name >> >> "storm1" >> >> >> >> On Tue, Apr 28, 2015 at 1:23 AM, Gour Saha <[email protected]> >> >>wrote: >> >> >> >>> Hmm.. Interesting. >> >>> >> >>> Is it possible to run "ps -ef | grep storm" before and after the >>storm1 >> >>> app is started and send the output? >> >>> >> >>> -Gour >> >>> >> >>> On 4/27/15, 12:48 PM, "Chackravarthy Esakkimuthu" >> >>><[email protected]> >> >>> wrote: >> >>> >> >>> >No, the processes are not old one, because it shows the class path >> >>>which >> >>> >has folder names corresponds to newly launched application id. >>(also >> >>> every >> >>> >time before launching new application, I made sure that all >>processes >> >>>are >> >>> >killed) >> >>> > >> >>> >And the output of list command as follows : >> >>> > >> >>> >sudo -u yarn /usr/hdp/current/slider-client/bin/./slider list >> >>> >2015-04-28 01:14:24,568 [main] INFO impl.TimelineClientImpl - >> >>>Timeline >> >>> >service address: http://host2:8188/ws/v1/timeline/ >> >>> >2015-04-28 01:14:25,669 [main] INFO client.RMProxy - Connecting to >> >>> >ResourceManager at host2/XX.XX.XX.XX:8050 >> >>> >storm1 FINISHED >> >>> application_1428575950531_0013 >> >>> > >> >>> >2015-04-28 01:14:26,108 [main] INFO util.ExitUtil - Exiting with >> >>>status >> >>> 0 >> >>> > >> >>> >On Tue, Apr 28, 2015 at 1:01 AM, Gour Saha <[email protected]> >> >>> wrote: >> >>> > >> >>> >> Sorry, forgot that --containers is supported in develop branch >>only. >> >>> >>Just >> >>> >> run list without that option. >> >>> >> >> >>> >> Seems like the running processes are stray processes from old >> >>> >>experimental >> >>> >> runs. Can you check the date/time of these processes? >> >>> >> >> >>> >> If you bring the storm instance up again, do you see new >>instances >> >>>of >> >>> >> nimbus, supervisor, etc. getting created? The old stray ones will >> >>> >>probably >> >>> >> still be there. >> >>> >> >> >>> >> Also, can you run just “slider list” (no other params) and send >>the >> >>> >>output? >> >>> >> >> >>> >> -Gour >> >>> >> >> >>> >> On 4/27/15, 12:20 PM, "Chackravarthy Esakkimuthu" >> >>> >><[email protected]> >> >>> >> wrote: >> >>> >> >> >>> >> >There is some issue in that command usage (i tried giving the >> >>>params >> >>> in >> >>> >> >the >> >>> >> >the order also) >> >>> >> > >> >>> >> >sudo -u yarn /usr/hdp/current/slider-client/bin/./slider list >> >>>storm1 >> >>> >> >--containers >> >>> >> > >> >>> >> >2015-04-28 00:42:01,017 [main] ERROR main.ServiceLauncher - >> >>> >> >com.beust.jcommander.ParameterException: Unknown option: >> >>>--containers >> >>> >>in >> >>> >> >list storm1 --containers >> >>> >> > >> >>> >> >2015-04-28 00:42:01,021 [main] INFO util.ExitUtil - Exiting >>with >> >>> >>status >> >>> >> >40 >> >>> >> > >> >>> >> >Anyway, I issued STOP command and checked in the RM UI, the >> >>> >>application is >> >>> >> >stopped and all the 5 containers are released.. It shows as ZERO >> >>> >> >containers >> >>> >> >is running. >> >>> >> > >> >>> >> >But, when I login to that machine, I could see storm components >>are >> >>> >>still >> >>> >> >running there (ps -ef | grep storm). The processes are up. Even >> >>>Storm >> >>> >>UI >> >>> >> >is >> >>> >> >still accessible. >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> >On Tue, Apr 28, 2015 at 12:29 AM, Gour Saha >><[email protected] >> > >> >>> >> wrote: >> >>> >> > >> >>> >> >> Calling ³slider stop² before ³slider destroy² is the right >>order. >> >>> >> >> >> >>> >> >> On calling stop, your storm cluster should be completely >>stopped >> >>> >> >> (including Slider AM and all storm components). >> >>> >> >> >> >>> >> >> Can you run this command after stop and send the output (don¹t >> >>>run >> >>> >> >>destroy >> >>> >> >> yet)? >> >>> >> >> >> >>> >> >> slider list <app-instance-name> --containers >> >>> >> >> >> >>> >> >> Also, at this point you should check the RM UI and it should >>show >> >>> >>that >> >>> >> >>the >> >>> >> >> yarn app is in stopped state. >> >>> >> >> >> >>> >> >> -Gour >> >>> >> >> >> >>> >> >> On 4/27/15, 11:52 AM, "Chackravarthy Esakkimuthu" >> >>> >> >><[email protected]> >> >>> >> >> wrote: >> >>> >> >> >> >>> >> >> >I started the storm on yarn (slider create) >> >>> >> >> >Then wanted to test whether destroying the storm works or >>not. >> >>> >> >> >So I tried in the following order : >> >>> >> >> > >> >>> >> >> >1) slider stop <app-instance-name> >> >>> >> >> >-- in this case, sliderAM alone stopped, and all the other >>storm >> >>> >> >>daemons >> >>> >> >> >like Nimbus, supervisor, log_viewer, drpc, UI_Server was >> >>>running. >> >>> >> >>(along >> >>> >> >> >with slider agents) >> >>> >> >> > >> >>> >> >> >Is this just an intermediate state before issuing destroy >> >>>command? >> >>> >> >> > >> >>> >> >> >2) slider destroy <app-instance-name> >> >>> >> >> >-- in this case, only nimbus and supervisor got killed. The >> >>>other >> >>> >>storm >> >>> >> >> >daemons (log_viewer, drpc, UI_Server) still running. And >>slider >> >>> >>agents >> >>> >> >> >too >> >>> >> >> >still running in all the 4 containers. >> >>> >> >> > >> >>> >> >> >This issue I face in 0.60 release. Then I tried with 0.71 >> >>>release. >> >>> >>But >> >>> >> >> >still same behaviour exists. >> >>> >> >> > >> >>> >> >> >Am I using the command in wrong way (or some other order) ? >>or >> >>> issue >> >>> >> >> >exists. >> >>> >> >> > >> >>> >> >> >Thanks in advance! >> >>> >> >> > >> >>> >> >> > >> >>> >> >> >Thanks, >> >>> >> >> >Chackra >> >>> >> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >> >> >>
