Re: Doubts on stop/destroy the application instance

Gour Saha Mon, 27 Apr 2015 14:32:03 -0700

To dig deeper, I would need to get hold of the Slider AM log (slider.log)
and at least one of the agent logs (slider-agent.log) for Nimbus say.


They will be under -
/hadoop/yarn/log/<app_id>/<container_id>/

OR you can run -
yarn logs -applicationId <app_id>
and dump it in a file, if the <app_id> directory under /hadoop/yarn/log is
missing.


Also if you could provide the Node Manager logs it would help. It is under
- 
/var/log/hadoop-yarn/yarn/

and file name of the format - yarn-yarn-nodemanager-<hostname>.log

-Gour

On 4/27/15, 1:32 PM, "Chackravarthy Esakkimuthu" <[email protected]>
wrote:

>Run “slider start storm1” again, it should create
>application_1428575950531_0014
>(with id 0014).
>   ---> yes it does
>
>After that can you check if the processes from
>application_1428575950531_0013 are still running?
>   ---> yes
>
>If yes, then run “slider stop storm1” again and then do you see processes
>from
>both application_1428575950531_0013 and application_1428575950531_0014
>running?
>   ---> yes both are running and able to access both storm UI's also.
>(only
>SliderAM was stopped)
>
>On Tue, Apr 28, 2015 at 1:54 AM, Gour Saha <[email protected]> wrote:
>
>> Yes, those processes correspond to slider agent.
>>
>> Based on the issue you are facing let’s do this -
>>
>> Run “slider start storm1” again, it should create
>> application_1428575950531_0014 (with id 0014). After that can you check
>>if
>> the processes from application_1428575950531_0013 are still running? If
>> yes, then run “slider stop storm1” again and then do you see processes
>> from both application_1428575950531_0013 and
>> application_1428575950531_0014 running?
>>
>> -Gour
>>
>> On 4/27/15, 1:11 PM, "Chackravarthy Esakkimuthu" <[email protected]>
>> wrote:
>>
>> >And how do we confirm that slider agents are stopped in each node where
>> >the
>> >container is allocated?
>> >because even after stop command and even destroy command, I could see
>> >agents seems to be running in all those nodes.
>> >
>> >yarn     47909 47907  0 00:37 ?        00:00:00 /bin/bash -c python
>> >./infra/agent/slider-agent/agent/main.py --label
>> >container_1428575950531_0013_01_000002___NIMBUS --zk-quorum
>> >host1:2181,host2:2181,host3:2181 --zk-reg-path
>> >/registry/users/yarn/services/org-apache-slider/storm1 >
>> 
>>>/var/log/hadoop-yarn/application_1428575950531_0013/container_1428575950
>>>53
>> >1_0013_01_000002/slider-agent.out
>> >2>&1
>> >yarn     47915 47909  0 00:37 ?        00:00:02 python
>> >./infra/agent/slider-agent/agent/main.py --label
>> >container_1428575950531_0013_01_000002___NIMBUS --zk-quorum
>> >host1:2181,host2:2181,host3:2181 --zk-reg-path
>> >/registry/users/yarn/services/org-apache-slider/storm1
>> >
>> >Doesn't these processes correspond to slider agent?
>> >
>> >On Tue, Apr 28, 2015 at 1:32 AM, Chackravarthy Esakkimuthu <
>> >[email protected]> wrote:
>> >
>> >> 1) slider create storm1
>> >> --- it started all the components, SliderAM, slider agents. And
>>storm UI
>> >> was accessible. Also manually logged into each host and verified all
>> >> components are up and running.
>> >>
>> >> 2) slider stop storm1
>> >> --- it stopped SliderAM
>> >> --- but all the components are running along with slider agents. And
>> >>storm
>> >> UI was accessible.
>> >>
>> >> 3) slider start storm1 (RM UI was less responsive during this time)
>> >> --- it started another sliderAM and other set of storm components and
>> >> slider agents also. And able to access storm UI in another host.
>> >>
>> >> So now, actually two storm cluster is running though I used same name
>> >> "storm1"
>> >>
>> >> On Tue, Apr 28, 2015 at 1:23 AM, Gour Saha <[email protected]>
>> >>wrote:
>> >>
>> >>> Hmm.. Interesting.
>> >>>
>> >>> Is it possible to run "ps -ef | grep storm" before and after the
>>storm1
>> >>> app is started and send the output?
>> >>>
>> >>> -Gour
>> >>>
>> >>> On 4/27/15, 12:48 PM, "Chackravarthy Esakkimuthu"
>> >>><[email protected]>
>> >>> wrote:
>> >>>
>> >>> >No, the processes are not old one, because it shows the class path
>> >>>which
>> >>> >has folder names corresponds to newly launched application id.
>>(also
>> >>> every
>> >>> >time before launching new application, I made sure that all
>>processes
>> >>>are
>> >>> >killed)
>> >>> >
>> >>> >And the output of list command as follows :
>> >>> >
>> >>> >sudo -u yarn /usr/hdp/current/slider-client/bin/./slider list
>> >>> >2015-04-28 01:14:24,568 [main] INFO  impl.TimelineClientImpl -
>> >>>Timeline
>> >>> >service address: http://host2:8188/ws/v1/timeline/
>> >>> >2015-04-28 01:14:25,669 [main] INFO  client.RMProxy - Connecting to
>> >>> >ResourceManager at host2/XX.XX.XX.XX:8050
>> >>> >storm1                            FINISHED
>> >>> application_1428575950531_0013
>> >>> >
>> >>> >2015-04-28 01:14:26,108 [main] INFO  util.ExitUtil - Exiting with
>> >>>status
>> >>> 0
>> >>> >
>> >>> >On Tue, Apr 28, 2015 at 1:01 AM, Gour Saha <[email protected]>
>> >>> wrote:
>> >>> >
>> >>> >> Sorry, forgot that --containers is supported in develop branch
>>only.
>> >>> >>Just
>> >>> >> run list without that option.
>> >>> >>
>> >>> >> Seems like the running processes are stray processes from old
>> >>> >>experimental
>> >>> >> runs. Can you check the date/time of these processes?
>> >>> >>
>> >>> >> If you bring the storm instance up again, do you see new
>>instances
>> >>>of
>> >>> >> nimbus, supervisor, etc. getting created? The old stray ones will
>> >>> >>probably
>> >>> >> still be there.
>> >>> >>
>> >>> >> Also, can you run just “slider list” (no other params) and send
>>the
>> >>> >>output?
>> >>> >>
>> >>> >> -Gour
>> >>> >>
>> >>> >> On 4/27/15, 12:20 PM, "Chackravarthy Esakkimuthu"
>> >>> >><[email protected]>
>> >>> >> wrote:
>> >>> >>
>> >>> >> >There is some issue in that command usage (i tried giving the
>> >>>params
>> >>> in
>> >>> >> >the
>> >>> >> >the order also)
>> >>> >> >
>> >>> >> >sudo -u yarn /usr/hdp/current/slider-client/bin/./slider list
>> >>>storm1
>> >>> >> >--containers
>> >>> >> >
>> >>> >> >2015-04-28 00:42:01,017 [main] ERROR main.ServiceLauncher -
>> >>> >> >com.beust.jcommander.ParameterException: Unknown option:
>> >>>--containers
>> >>> >>in
>> >>> >> >list storm1 --containers
>> >>> >> >
>> >>> >> >2015-04-28 00:42:01,021 [main] INFO  util.ExitUtil - Exiting
>>with
>> >>> >>status
>> >>> >> >40
>> >>> >> >
>> >>> >> >Anyway, I issued STOP command and checked in the RM UI, the
>> >>> >>application is
>> >>> >> >stopped and all the 5 containers are released.. It shows as ZERO
>> >>> >> >containers
>> >>> >> >is running.
>> >>> >> >
>> >>> >> >But, when I login to that machine, I could see storm components
>>are
>> >>> >>still
>> >>> >> >running there (ps -ef | grep storm). The processes are up. Even
>> >>>Storm
>> >>> >>UI
>> >>> >> >is
>> >>> >> >still accessible.
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >On Tue, Apr 28, 2015 at 12:29 AM, Gour Saha
>><[email protected]
>> >
>> >>> >> wrote:
>> >>> >> >
>> >>> >> >> Calling ³slider stop² before ³slider destroy² is the right
>>order.
>> >>> >> >>
>> >>> >> >> On calling stop, your storm cluster should be completely
>>stopped
>> >>> >> >> (including Slider AM and all storm components).
>> >>> >> >>
>> >>> >> >> Can you run this command after stop and send the output (don¹t
>> >>>run
>> >>> >> >>destroy
>> >>> >> >> yet)?
>> >>> >> >>
>> >>> >> >> slider list <app-instance-name> --containers
>> >>> >> >>
>> >>> >> >> Also, at this point you should check the RM UI and it should
>>show
>> >>> >>that
>> >>> >> >>the
>> >>> >> >> yarn app is in stopped state.
>> >>> >> >>
>> >>> >> >> -Gour
>> >>> >> >>
>> >>> >> >> On 4/27/15, 11:52 AM, "Chackravarthy Esakkimuthu"
>> >>> >> >><[email protected]>
>> >>> >> >> wrote:
>> >>> >> >>
>> >>> >> >> >I started the storm on yarn (slider create)
>> >>> >> >> >Then wanted to test whether destroying the storm works or
>>not.
>> >>> >> >> >So I tried in the following order :
>> >>> >> >> >
>> >>> >> >> >1) slider stop <app-instance-name>
>> >>> >> >> >-- in this case, sliderAM alone stopped, and all the other
>>storm
>> >>> >> >>daemons
>> >>> >> >> >like Nimbus, supervisor, log_viewer,  drpc, UI_Server was
>> >>>running.
>> >>> >> >>(along
>> >>> >> >> >with slider agents)
>> >>> >> >> >
>> >>> >> >> >Is this just an intermediate state before issuing destroy
>> >>>command?
>> >>> >> >> >
>> >>> >> >> >2) slider destroy <app-instance-name>
>> >>> >> >> >-- in this case, only nimbus and supervisor got killed. The
>> >>>other
>> >>> >>storm
>> >>> >> >> >daemons (log_viewer,  drpc, UI_Server) still running. And
>>slider
>> >>> >>agents
>> >>> >> >> >too
>> >>> >> >> >still running in all the 4 containers.
>> >>> >> >> >
>> >>> >> >> >This issue I face in 0.60 release. Then I tried with 0.71
>> >>>release.
>> >>> >>But
>> >>> >> >> >still same behaviour exists.
>> >>> >> >> >
>> >>> >> >> >Am I using the command in wrong way (or some other order) ?
>>or
>> >>> issue
>> >>> >> >> >exists.
>> >>> >> >> >
>> >>> >> >> >Thanks in advance!
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >Thanks,
>> >>> >> >> >Chackra
>> >>> >> >>
>> >>> >> >>
>> >>> >>
>> >>> >>
>> >>>
>> >>>
>> >>
>>
>>

Re: Doubts on stop/destroy the application instance

Reply via email to