Chackra, We believe you are running into redirection issue when RM HA is setup - https://issues.apache.org/jira/browse/YARN-1525
https://issues.apache.org/jira/browse/YARN-1811 These were fixed in Hadoop 2.6 (the version you have). But we still found issues with Slider AM UI in Slider version 0.60 (the version you are using) on top of Hadoop 2.6. I thought we filed a JIRA on it, but could not find any. I went ahead and filed one now - https://issues.apache.org/jira/browse/SLIDER-846 Workaround - Is this a production cluster? If not, can you disable RM HA and check if you can access the AM UI and also run all slider command lines successfully? This is a basic test to make ensure that this is indeed happening because of RM HA setup. Once we verify the above revert back to RM HA again. I think we can make the Slider AM UI work in the RM HA setup by doing this (we haven’t tested this so not 100% sure it will work) - In the RM HA setup we can use YARN labels and constrain the Slider AM to come up in the active RM node. Let me know if you want to try this route and I would be happy to help you out with details on how to set this up. -Gour On 4/8/15, 9:17 AM, "Chackravarthy Esakkimuthu" <[email protected]> wrote: >No, iptables is not enabled i think. (will confirm) >But, AM is running, even other containers are running and I could see >storm/hbase daemons running in those nodes. >Does this mean installation is successful? How do I check the status of >the >installation? > >Tried using slider command with no success, (Please let me know if am I >using it wrongly) >- storm-yarn-1 and hb1 are the names which I used to for "slider create" >command. > >/usr/hdp/current/slider-client/bin/./slider status *storm-yarn-1* >2015-04-08 21:40:17,178 [main] INFO impl.TimelineClientImpl - Timeline >service address: http://host2:8188/ws/v1/timeline/ >2015-04-08 21:40:17,782 [main] WARN shortcircuit.DomainSocketFactory - >The >short-circuit local reads feature cannot be used because libhadoop cannot >be loaded. >2015-04-08 21:40:17,936 [main] INFO > client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 >2015-04-08 21:40:17,970 [main] ERROR main.ServiceLauncher - *Unknown >application instance : storm-yarn-1* >2015-04-08 21:40:17,971 [main] INFO util.ExitUtil - Exiting with status >69 > >/usr/hdp/current/slider-client/bin/./slider status *hb1* >2015-04-08 21:40:31,344 [main] INFO impl.TimelineClientImpl - Timeline >service address: http://host2:8188/ws/v1/timeline/ >2015-04-08 21:40:32,075 [main] WARN shortcircuit.DomainSocketFactory - >The >short-circuit local reads feature cannot be used because libhadoop cannot >be loaded. >2015-04-08 21:40:32,263 [main] INFO > client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 >2015-04-08 21:40:32,306 [main] ERROR main.ServiceLauncher - *Unknown >application instance : hb1* >2015-04-08 21:40:32,308 [main] INFO util.ExitUtil - Exiting with status >69 > > >On Wed, Apr 8, 2015 at 7:14 PM, Jon Maron <[email protected]> wrote: > >> Indications seem to be that the AM is started but the AM URI you’re >> attempting to attach to may be mistaken or there may be something >> preventing the actual connection. Any chance iptables is enabled? >> >> >> > On Apr 8, 2015, at 3:44 AM, Gour Saha <[email protected]> wrote: >> > >> > Jon was right. I think Storm uses ${USER_NAME} for app_user instead of >> hard coding as yarn unlike hbase. So either users were fine. >> > >> > One thing I saw in the AM and RM urls is that they link to >> zs-aaa-001.nm.flipkart.com and zs-exp-01.nm.flipkart.com. Can you hand >> edit the AM URL to try both the host aliases? >> > >> > I am not sure if the above will work in which case if you could send >>the >> entire AM logs then it would be great. >> > >> > -Gour >> > >> > - Sent from my iPhone >> > >> >> On Apr 7, 2015, at 11:08 PM, "Chackravarthy Esakkimuthu" < >> [email protected]> wrote: >> >> >> >> Tried running with 'yarn' user, but it remains in same state. >> >> AM link not working, and AM logs are similar. >> >> >> >> On Wed, Apr 8, 2015 at 2:14 AM, Gour Saha <[email protected]> >> wrote: >> >> >> >>> In a non-secured cluster you should run as yarn. Can you do that and >> let >> >>> us know how it goes? >> >>> >> >>> Also you can stop your existing storm instance in hdfs user (run as >> hdfs >> >>> user) by running stop first - >> >>> slider stop storm1 >> >>> >> >>> -Gour >> >>> >> >>> On 4/7/15, 1:39 PM, "Chackravarthy Esakkimuthu" >><[email protected] >> > >> >>> wrote: >> >>> >> >>>> This is not a secured cluster. >> >>>> And yes, I used 'hdfs' user while running slider create. >> >>>> >> >>>>> On Wed, Apr 8, 2015 at 2:03 AM, Gour Saha <[email protected]> >> wrote: >> >>>>> >> >>>>> Which user are you running the slider create command as? Seems >>like >> you >> >>>>> are running as hdfs user. Is this a secured cluster? >> >>>>> >> >>>>> -Gour >> >>>>> >> >>>>> On 4/7/15, 1:06 PM, "Chackravarthy Esakkimuthu" < >> [email protected]> >> >>>>> wrote: >> >>>>> >> >>>>>> yes, RM HA has been setup in this cluster. >> >>>>>> >> >>>>>> Active : zs-aaa-001.nm.flipkart.com >> >>>>>> Standby : zs-aaa-002.nm.flipkart.com >> >>>>>> >> >>>>>> RM Link : >>http://zs-aaa-001.nm.flipkart.com:8088/cluster/scheduler >> >>>>>> <http://zs-exp-01.nm.flipkart.com:8088/cluster/scheduler> >> >>>>>> >> >>>>>> AM Link : >> >>> >> >>http://zs-aaa-001.nm.flipkart.com:8088/proxy/application_1427882795362_00 >> >>>>> 7 >> >>>>>> 0/slideram >> >>>>>> < >> >>> >> >>http://zs-exp-01.nm.flipkart.com:8088/proxy/application_1427882795362_007 >> >>>>>> 0/slideram> >> >>>>>> >> >>>>>>> On Wed, Apr 8, 2015 at 1:05 AM, Gour Saha >><[email protected]> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> Sorry forgot that the AM link not working was the original >>issue. >> >>>>>>> >> >>>>>>> Few more things - >> >>>>>>> - Seems like you have RM HA setup, right? >> >>>>>>> - Can you copy paste the complete link of the RM UI and the URL >>of >> >>>>>>> ApplicationMaster (the link which is broken) with actual >>hostnames? >> >>>>>>> >> >>>>>>> >> >>>>>>> -Gour >> >>>>>>> >> >>>>>>> On 4/7/15, 11:43 AM, "Chackravarthy Esakkimuthu" >> >>>>> <[email protected] >> >>>>>> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>>> Since 5 containers are running, which means that Storm daemons >>are >> >>>>>>> already >> >>>>>>>> up and running? >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Actually the ApplicationMaster link is not working. It just >>blanks >> >>>>> out >> >>>>>>>> printing the following : >> >>>>>>>> >> >>>>>>>> This is standby RM. Redirecting to the current active RM: >> >>>>>> http:// >> <host-name>:8088/proxy/application_1427882795362_0070/slideram >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> And for resources.json, I dint make any change and used the >>copy >> of >> >>>>>>>> resources-default.json as follows: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> { >> >>>>>>>> >> >>>>>>>> "schema" : "http://example.org/specification/v2.0.0", >> >>>>>>>> >> >>>>>>>> "metadata" : { >> >>>>>>>> >> >>>>>>>> }, >> >>>>>>>> >> >>>>>>>> "global" : { >> >>>>>>>> >> >>>>>>>> "yarn.log.include.patterns": "", >> >>>>>>>> >> >>>>>>>> "yarn.log.exclude.patterns": "" >> >>>>>>>> >> >>>>>>>> }, >> >>>>>>>> >> >>>>>>>> "components": { >> >>>>>>>> >> >>>>>>>> "slider-appmaster": { >> >>>>>>>> >> >>>>>>>> "yarn.memory": "512" >> >>>>>>>> >> >>>>>>>> }, >> >>>>>>>> >> >>>>>>>> "NIMBUS": { >> >>>>>>>> >> >>>>>>>> "yarn.role.priority": "1", >> >>>>>>>> >> >>>>>>>> "yarn.component.instances": "1", >> >>>>>>>> >> >>>>>>>> "yarn.memory": "2048" >> >>>>>>>> >> >>>>>>>> }, >> >>>>>>>> >> >>>>>>>> "STORM_UI_SERVER": { >> >>>>>>>> >> >>>>>>>> "yarn.role.priority": "2", >> >>>>>>>> >> >>>>>>>> "yarn.component.instances": "1", >> >>>>>>>> >> >>>>>>>> "yarn.memory": "1278" >> >>>>>>>> >> >>>>>>>> }, >> >>>>>>>> >> >>>>>>>> "DRPC_SERVER": { >> >>>>>>>> >> >>>>>>>> "yarn.role.priority": "3", >> >>>>>>>> >> >>>>>>>> "yarn.component.instances": "1", >> >>>>>>>> >> >>>>>>>> "yarn.memory": "1278" >> >>>>>>>> >> >>>>>>>> }, >> >>>>>>>> >> >>>>>>>> "SUPERVISOR": { >> >>>>>>>> >> >>>>>>>> "yarn.role.priority": "4", >> >>>>>>>> >> >>>>>>>> "yarn.component.instances": "1", >> >>>>>>>> >> >>>>>>>> "yarn.memory": "3072" >> >>>>>>>> >> >>>>>>>> } >> >>>>>>>> >> >>>>>>>> } >> >>>>>>>> >> >>>>>>>> } >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> On Tue, Apr 7, 2015 at 11:52 PM, Gour Saha < >> [email protected]> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Chackra sent the attachment directly to me. From what I see >>the >> >>>>>>> cluster >> >>>>>>>>> resources (memory and cores) are abundant. >> >>>>>>>>> >> >>>>>>>>> But I also see that only 1 app is running which is the one we >>are >> >>>>>>> trying >> >>>>>>>>> to debug and 5 containers are running. So definitely more >> >>>>> containers >> >>>>>>>>> that >> >>>>>>>>> just the AM is running. >> >>>>>>>>> >> >>>>>>>>> Can you click on the app master link and copy paste the >>content >> of >> >>>>>>> that >> >>>>>>>>> page? No need for screen shot. Also please send your resources >> >>>>> JSON >> >>>>>>>>> file. >> >>>>>>>>> >> >>>>>>>>> -Gour >> >>>>>>>>> >> >>>>>>>>> - Sent from my iPhone >> >>>>>>>>> >> >>>>>>>>>> On Apr 7, 2015, at 11:01 AM, "Jon Maron" >> >>>>> <[email protected]> >> >>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>> On Apr 7, 2015, at 1:36 PM, Chackravarthy Esakkimuthu < >> >>>>>>>>>> [email protected]<mailto:[email protected]>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> @Maron, I could not get the logs even though the application >>is >> >>>>>>> still >> >>>>>>>>> running. >> >>>>>>>>>> It's a 10 node cluster and I logged into one of the node and >> >>>>>>> executed >> >>>>>>>>> the command : >> >>>>>>>>>> >> >>>>>>>>>> sudo -u hdfs yarn logs -applicationId >> >>>>>>> application_1427882795362_0070 >> >>>>>>>>>> 15/04/07 22:56:09 INFO impl.TimelineClientImpl: Timeline >>service >> >>>>>>>>> address: http://$HOST:PORT/ws/v1/timeline/ >> >>>>>>>>>> 15/04/07 22:56:09 INFO >>client.ConfiguredRMFailoverProxyProvider: >> >>>>>>>>> Failing >> >>>>>>>>> over to rm2 >> >>>>>>>>>> /app-logs/hdfs/logs/application_1427882795362_0070does not >>have >> >>>>> any >> >>>>>>>>> log >> >>>>>>>>> files. >> >>>>>>>>>> >> >>>>>>>>>> Can you login to the cluster node and look at the logs >>directory >> >>>>>>> (e.g. >> >>>>>>>>> in HDP install it would be under /hadoop/yarn/logs IIRC)? >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> @Gour, Please find the attachment. >> >>>>>>>>>> >> >>>>>>>>>> On Tue, Apr 7, 2015 at 10:57 PM, Gour Saha >> >>>>> <[email protected] >> >>>>>>>>> <mailto:[email protected]>> wrote: >> >>>>>>>>>> Can you take a screenshot of your RM UI and send it over? It >>is >> >>>>>>>>> usually >> >>>>>>>>>> available in a URI similar to >> >>>>>>>>> http://c6410.ambari.apache.org:8088/cluster. >> >>>>>>>>>> I am specifically interested in seeing the Cluster Metrics >> >>>>> table. >> >>>>>>>>>> >> >>>>>>>>>> -Gour >> >>>>>>>>>> >> >>>>>>>>>>> On 4/7/15, 10:17 AM, "Jon Maron" >> >>>>> <[email protected]<mailto: >> >>>>>>>>> [email protected]>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>>> On Apr 7, 2015, at 1:14 PM, Jon Maron >> >>>>>>>>> <[email protected]<mailto: >> >>>>>>>>> [email protected]>> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>>> On Apr 7, 2015, at 1:08 PM, Chackravarthy Esakkimuthu >> >>>>>>>>>>>>> <[email protected]<mailto:[email protected]>> >>wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Thanks for the reply guys! >> >>>>>>>>>>>>> Contianer allocation happened successfully. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> *RoleStatus{name='slider-appmaster', key=0, minimum=0, >> >>>>>>> maximum=1, >> >>>>>>>>>>>>> desired=1, actual=1,* >> >>>>>>>>>>>>> *RoleStatus{name='STORM_UI_SERVER', key=2, minimum=0, >> >>>>> maximum=1, >> >>>>>>>>>>>>> desired=1, >> >>>>>>>>>>>>> actual=1, * >> >>>>>>>>>>>>> *RoleStatus{name='NIMBUS', key=1, minimum=0, maximum=1, >> >>>>>>> desired=1, >> >>>>>>>>>>>>> actual=1, * >> >>>>>>>>>>>>> *RoleStatus{name='DRPC_SERVER', key=3, minimum=0, >>maximum=1, >> >>>>>>>>> desired=1, >> >>>>>>>>>>>>> actual=1, * >> >>>>>>>>>>>>> *RoleStatus{name='SUPERVISOR', key=4, minimum=0, >>maximum=1, >> >>>>>>>>> desired=1, >> >>>>>>>>>>>>> actual=1,* >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Also, have put some logs specific to a container.. >>(nimbus) >> >>>>> Same >> >>>>>>>>> set >> >>>>>>>>> of >> >>>>>>>>>>>>> logs available for other Roles also (except Supervisor >>which >> >>>>> has >> >>>>>>>>> only >> >>>>>>>>>>>>> first >> >>>>>>>>>>>>> 2 lines of below logs) >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> *Installing NIMBUS on >> >>>>>>> container_e04_1427882795362_0070_01_000002.* >> >>>>>>>>>>>>> *Starting NIMBUS on >> >>>>> container_e04_1427882795362_0070_01_000002.* >> >>>>>>>>>>>>> *Registering component >> >>>>>>> container_e04_1427882795362_0070_01_000002* >> >>>>>>>>>>>>> *Requesting applied config for NIMBUS on >> >>>>>>>>>>>>> container_e04_1427882795362_0070_01_000002.* >> >>>>>>>>>>>>> *Received and processed config for >> >>>>>>>>>>>>> container_e04_1427882795362_0070_01_000002___NIMBUS* >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Does this result in any intermediate state? >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> @Maron, I didn't configure any port specifically.. do I >>need >> >>>>> to >> >>>>>>> to? >> >>>>>>>>>>>>> Also, i >> >>>>>>>>>>>>> don't see any error msg in AM logs wrt port conflict. >> >>>>>>>>>>>> >> >>>>>>>>>>>> My only concern was whether you were actually accession the >> >>>>> web >> >>>>>>> UIs >> >>>>>>>>> at >> >>>>>>>>>>>> the correct host and port. If you are then the next step >>is >> >>>>>>>>> probably >> >>>>>>>>> to >> >>>>>>>>>>>> look at the actual storm/hbase logs. you can use the ³yarn >> >>>>> logs >> >>>>>>>>>>>> -applicationid ..² command. >> >>>>>>>>>>> >> >>>>>>>>>>> *accessing* ;) >> >>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>> Chackra >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Tue, Apr 7, 2015 at 9:02 PM, Jon Maron >> >>>>>>> <[email protected] >> >>>>>>>>> <mailto:[email protected]>> >> >>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Apr 7, 2015, at 11:03 AM, Billie Rinaldi >> >>>>>>>>>>>>>>> >><[email protected]<mailto:[email protected] >> >>>>> >> >>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> One thing you can check is whether your system has >>enough >> >>>>>>>>> resources >> >>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>> allocate all the containers the app needs. You will see >> >>>>> info >> >>>>>>>>> like >> >>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>> following in the AM log (it will be logged multiple >>times >> >>>>> over >> >>>>>>>>> the >> >>>>>>>>>>>>>>> life >> >>>>>>>>>>>>>> of >> >>>>>>>>>>>>>>> the AM). In this case, the master I requested was >> >>>>> allocated >> >>>>>>> but >> >>>>>>>>> the >> >>>>>>>>>>>>>>> tservers were not. >> >>>>>>>>>>>>>>> RoleStatus{name='ACCUMULO_TSERVER', key=2, desired=2, >> >>>>>>> actual=0, >> >>>>>>>>>>>>>>> requested=2, releasing=0, failed=0, started=0, >> >>>>> startFailed=0, >> >>>>>>>>>>>>>> completed=0, >> >>>>>>>>>>>>>>> failureMessage=''} >> >>>>>>>>>>>>>>> RoleStatus{name='ACCUMULO_MASTER', key=1, desired=1, >> >>>>> actual=1, >> >>>>>>>>>>>>>> requested=0, >> >>>>>>>>>>>>>>> releasing=0, failed=0, started=0, startFailed=0, >> >>>>> completed=0, >> >>>>>>>>>>>>>>> failureMessage=Œ'} >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> You can also check the ³Scheduler² link on the RM Web UI >>to >> >>>>>>> get a >> >>>>>>>>>>>>>> sense of >> >>>>>>>>>>>>>> whether you are resource constrained. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Are you certain that you are attempting to invoke the >> >>>>> correct >> >>>>>>>>> port? >> >>>>>>>>>>>>>> The >> >>>>>>>>>>>>>> listening ports are dynamically allocated by Slider. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Tue, Apr 7, 2015 at 3:29 AM, Chackravarthy >>Esakkimuthu < >> >>>>>>>>>>>>>>> [email protected]<mailto:[email protected]>> >> >>> wrote: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Hi All, >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> I am new to Apache slider and would like to contribute. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Just to start with, I am trying out running "storm" and >> >>>>>>>>> "hbase" on >> >>>>>>>>>>>>>>>> yarn >> >>>>>>>>>>>>>>>> using slider following the guide : >> >>>>> >> >>> >> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/run >> >>>>>>>>>>>>>> ning_applications_on_slider/index.html#Item1.1 >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> In both (storm and hbase) the cases, the >>ApplicationMaster >> >>>>>>> gets >> >>>>>>>>>>>>>>>> launched >> >>>>>>>>>>>>>>>> and still running, but the ApplicationMaster link not >> >>>>>>> working, >> >>>>>>>>> and >> >>>>>>>>>>>>>>>> from >> >>>>>>>>>>>>>> AM >> >>>>>>>>>>>>>>>> logs, I don't see any errors. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> How do I debug from this? Please help me. >> >>>>>>>>>>>>>>>> Incase if there is any other mail thread with respect >> >>>>> this, >> >>>>>>>>> please >> >>>>>>>>>>>>>>>> point >> >>>>>>>>>>>>>>>> out to me. Thanks in advance. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>>>>> Chackra >> >>> >> >>> >> >>
