sure Gour, I will check with RM HA disabled and then will get in touch with you to test with workaround enabling RM HA. Thanks a lot!!
On Wed, Apr 8, 2015 at 10:59 PM, Gour Saha <[email protected]> wrote: > Chackra, > > We believe you are running into redirection issue when RM HA is setup - > https://issues.apache.org/jira/browse/YARN-1525 > > https://issues.apache.org/jira/browse/YARN-1811 > > > These were fixed in Hadoop 2.6 (the version you have). But we still found > issues with Slider AM UI in Slider version 0.60 (the version you are > using) on top of Hadoop 2.6. > > > I thought we filed a JIRA on it, but could not find any. I went ahead and > filed one now - > https://issues.apache.org/jira/browse/SLIDER-846 > > > > Workaround - > Is this a production cluster? If not, can you disable RM HA and check if > you can access the AM UI and also run all slider command lines > successfully? This is a basic test to make ensure that this is indeed > happening because of RM HA setup. > > Once we verify the above revert back to RM HA again. I think we can make > the Slider AM UI work in the RM HA setup by doing this (we haven’t tested > this so not 100% sure it will work) - > > In the RM HA setup we can use YARN labels and constrain the Slider AM to > come up in the active RM node. Let me know if you want to try this route > and I would be happy to help you out with details on how to set this up. > > > -Gour > > On 4/8/15, 9:17 AM, "Chackravarthy Esakkimuthu" <[email protected]> > wrote: > > >No, iptables is not enabled i think. (will confirm) > >But, AM is running, even other containers are running and I could see > >storm/hbase daemons running in those nodes. > >Does this mean installation is successful? How do I check the status of > >the > >installation? > > > >Tried using slider command with no success, (Please let me know if am I > >using it wrongly) > >- storm-yarn-1 and hb1 are the names which I used to for "slider create" > >command. > > > >/usr/hdp/current/slider-client/bin/./slider status *storm-yarn-1* > >2015-04-08 21:40:17,178 [main] INFO impl.TimelineClientImpl - Timeline > >service address: http://host2:8188/ws/v1/timeline/ > >2015-04-08 21:40:17,782 [main] WARN shortcircuit.DomainSocketFactory - > >The > >short-circuit local reads feature cannot be used because libhadoop cannot > >be loaded. > >2015-04-08 21:40:17,936 [main] INFO > > client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 > >2015-04-08 21:40:17,970 [main] ERROR main.ServiceLauncher - *Unknown > >application instance : storm-yarn-1* > >2015-04-08 21:40:17,971 [main] INFO util.ExitUtil - Exiting with status > >69 > > > >/usr/hdp/current/slider-client/bin/./slider status *hb1* > >2015-04-08 21:40:31,344 [main] INFO impl.TimelineClientImpl - Timeline > >service address: http://host2:8188/ws/v1/timeline/ > >2015-04-08 21:40:32,075 [main] WARN shortcircuit.DomainSocketFactory - > >The > >short-circuit local reads feature cannot be used because libhadoop cannot > >be loaded. > >2015-04-08 21:40:32,263 [main] INFO > > client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 > >2015-04-08 21:40:32,306 [main] ERROR main.ServiceLauncher - *Unknown > >application instance : hb1* > >2015-04-08 21:40:32,308 [main] INFO util.ExitUtil - Exiting with status > >69 > > > > > >On Wed, Apr 8, 2015 at 7:14 PM, Jon Maron <[email protected]> wrote: > > > >> Indications seem to be that the AM is started but the AM URI you’re > >> attempting to attach to may be mistaken or there may be something > >> preventing the actual connection. Any chance iptables is enabled? > >> > >> > >> > On Apr 8, 2015, at 3:44 AM, Gour Saha <[email protected]> wrote: > >> > > >> > Jon was right. I think Storm uses ${USER_NAME} for app_user instead of > >> hard coding as yarn unlike hbase. So either users were fine. > >> > > >> > One thing I saw in the AM and RM urls is that they link to > >> zs-aaa-001.nm.flipkart.com and zs-exp-01.nm.flipkart.com. Can you hand > >> edit the AM URL to try both the host aliases? > >> > > >> > I am not sure if the above will work in which case if you could send > >>the > >> entire AM logs then it would be great. > >> > > >> > -Gour > >> > > >> > - Sent from my iPhone > >> > > >> >> On Apr 7, 2015, at 11:08 PM, "Chackravarthy Esakkimuthu" < > >> [email protected]> wrote: > >> >> > >> >> Tried running with 'yarn' user, but it remains in same state. > >> >> AM link not working, and AM logs are similar. > >> >> > >> >> On Wed, Apr 8, 2015 at 2:14 AM, Gour Saha <[email protected]> > >> wrote: > >> >> > >> >>> In a non-secured cluster you should run as yarn. Can you do that and > >> let > >> >>> us know how it goes? > >> >>> > >> >>> Also you can stop your existing storm instance in hdfs user (run as > >> hdfs > >> >>> user) by running stop first - > >> >>> slider stop storm1 > >> >>> > >> >>> -Gour > >> >>> > >> >>> On 4/7/15, 1:39 PM, "Chackravarthy Esakkimuthu" > >><[email protected] > >> > > >> >>> wrote: > >> >>> > >> >>>> This is not a secured cluster. > >> >>>> And yes, I used 'hdfs' user while running slider create. > >> >>>> > >> >>>>> On Wed, Apr 8, 2015 at 2:03 AM, Gour Saha <[email protected]> > >> wrote: > >> >>>>> > >> >>>>> Which user are you running the slider create command as? Seems > >>like > >> you > >> >>>>> are running as hdfs user. Is this a secured cluster? > >> >>>>> > >> >>>>> -Gour > >> >>>>> > >> >>>>> On 4/7/15, 1:06 PM, "Chackravarthy Esakkimuthu" < > >> [email protected]> > >> >>>>> wrote: > >> >>>>> > >> >>>>>> yes, RM HA has been setup in this cluster. > >> >>>>>> > >> >>>>>> Active : zs-aaa-001.nm.flipkart.com > >> >>>>>> Standby : zs-aaa-002.nm.flipkart.com > >> >>>>>> > >> >>>>>> RM Link : > >>http://zs-aaa-001.nm.flipkart.com:8088/cluster/scheduler > >> >>>>>> <http://zs-exp-01.nm.flipkart.com:8088/cluster/scheduler> > >> >>>>>> > >> >>>>>> AM Link : > >> >>> > >> > >> > http://zs-aaa-001.nm.flipkart.com:8088/proxy/application_1427882795362_00 > >> >>>>> 7 > >> >>>>>> 0/slideram > >> >>>>>> < > >> >>> > >> > >> > http://zs-exp-01.nm.flipkart.com:8088/proxy/application_1427882795362_007 > >> >>>>>> 0/slideram> > >> >>>>>> > >> >>>>>>> On Wed, Apr 8, 2015 at 1:05 AM, Gour Saha > >><[email protected]> > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>>> Sorry forgot that the AM link not working was the original > >>issue. > >> >>>>>>> > >> >>>>>>> Few more things - > >> >>>>>>> - Seems like you have RM HA setup, right? > >> >>>>>>> - Can you copy paste the complete link of the RM UI and the URL > >>of > >> >>>>>>> ApplicationMaster (the link which is broken) with actual > >>hostnames? > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> -Gour > >> >>>>>>> > >> >>>>>>> On 4/7/15, 11:43 AM, "Chackravarthy Esakkimuthu" > >> >>>>> <[email protected] > >> >>>>>> > >> >>>>>>> wrote: > >> >>>>>>> > >> >>>>>>>> Since 5 containers are running, which means that Storm daemons > >>are > >> >>>>>>> already > >> >>>>>>>> up and running? > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> Actually the ApplicationMaster link is not working. It just > >>blanks > >> >>>>> out > >> >>>>>>>> printing the following : > >> >>>>>>>> > >> >>>>>>>> This is standby RM. Redirecting to the current active RM: > >> >>>>>> http:// > >> <host-name>:8088/proxy/application_1427882795362_0070/slideram > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> And for resources.json, I dint make any change and used the > >>copy > >> of > >> >>>>>>>> resources-default.json as follows: > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> { > >> >>>>>>>> > >> >>>>>>>> "schema" : "http://example.org/specification/v2.0.0", > >> >>>>>>>> > >> >>>>>>>> "metadata" : { > >> >>>>>>>> > >> >>>>>>>> }, > >> >>>>>>>> > >> >>>>>>>> "global" : { > >> >>>>>>>> > >> >>>>>>>> "yarn.log.include.patterns": "", > >> >>>>>>>> > >> >>>>>>>> "yarn.log.exclude.patterns": "" > >> >>>>>>>> > >> >>>>>>>> }, > >> >>>>>>>> > >> >>>>>>>> "components": { > >> >>>>>>>> > >> >>>>>>>> "slider-appmaster": { > >> >>>>>>>> > >> >>>>>>>> "yarn.memory": "512" > >> >>>>>>>> > >> >>>>>>>> }, > >> >>>>>>>> > >> >>>>>>>> "NIMBUS": { > >> >>>>>>>> > >> >>>>>>>> "yarn.role.priority": "1", > >> >>>>>>>> > >> >>>>>>>> "yarn.component.instances": "1", > >> >>>>>>>> > >> >>>>>>>> "yarn.memory": "2048" > >> >>>>>>>> > >> >>>>>>>> }, > >> >>>>>>>> > >> >>>>>>>> "STORM_UI_SERVER": { > >> >>>>>>>> > >> >>>>>>>> "yarn.role.priority": "2", > >> >>>>>>>> > >> >>>>>>>> "yarn.component.instances": "1", > >> >>>>>>>> > >> >>>>>>>> "yarn.memory": "1278" > >> >>>>>>>> > >> >>>>>>>> }, > >> >>>>>>>> > >> >>>>>>>> "DRPC_SERVER": { > >> >>>>>>>> > >> >>>>>>>> "yarn.role.priority": "3", > >> >>>>>>>> > >> >>>>>>>> "yarn.component.instances": "1", > >> >>>>>>>> > >> >>>>>>>> "yarn.memory": "1278" > >> >>>>>>>> > >> >>>>>>>> }, > >> >>>>>>>> > >> >>>>>>>> "SUPERVISOR": { > >> >>>>>>>> > >> >>>>>>>> "yarn.role.priority": "4", > >> >>>>>>>> > >> >>>>>>>> "yarn.component.instances": "1", > >> >>>>>>>> > >> >>>>>>>> "yarn.memory": "3072" > >> >>>>>>>> > >> >>>>>>>> } > >> >>>>>>>> > >> >>>>>>>> } > >> >>>>>>>> > >> >>>>>>>> } > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>>> On Tue, Apr 7, 2015 at 11:52 PM, Gour Saha < > >> [email protected]> > >> >>>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>>> Chackra sent the attachment directly to me. From what I see > >>the > >> >>>>>>> cluster > >> >>>>>>>>> resources (memory and cores) are abundant. > >> >>>>>>>>> > >> >>>>>>>>> But I also see that only 1 app is running which is the one we > >>are > >> >>>>>>> trying > >> >>>>>>>>> to debug and 5 containers are running. So definitely more > >> >>>>> containers > >> >>>>>>>>> that > >> >>>>>>>>> just the AM is running. > >> >>>>>>>>> > >> >>>>>>>>> Can you click on the app master link and copy paste the > >>content > >> of > >> >>>>>>> that > >> >>>>>>>>> page? No need for screen shot. Also please send your resources > >> >>>>> JSON > >> >>>>>>>>> file. > >> >>>>>>>>> > >> >>>>>>>>> -Gour > >> >>>>>>>>> > >> >>>>>>>>> - Sent from my iPhone > >> >>>>>>>>> > >> >>>>>>>>>> On Apr 7, 2015, at 11:01 AM, "Jon Maron" > >> >>>>> <[email protected]> > >> >>>>>>>>> wrote: > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>>> On Apr 7, 2015, at 1:36 PM, Chackravarthy Esakkimuthu < > >> >>>>>>>>>> [email protected]<mailto:[email protected]>> wrote: > >> >>>>>>>>>> > >> >>>>>>>>>> @Maron, I could not get the logs even though the application > >>is > >> >>>>>>> still > >> >>>>>>>>> running. > >> >>>>>>>>>> It's a 10 node cluster and I logged into one of the node and > >> >>>>>>> executed > >> >>>>>>>>> the command : > >> >>>>>>>>>> > >> >>>>>>>>>> sudo -u hdfs yarn logs -applicationId > >> >>>>>>> application_1427882795362_0070 > >> >>>>>>>>>> 15/04/07 22:56:09 INFO impl.TimelineClientImpl: Timeline > >>service > >> >>>>>>>>> address: http://$HOST:PORT/ws/v1/timeline/ > >> >>>>>>>>>> 15/04/07 22:56:09 INFO > >>client.ConfiguredRMFailoverProxyProvider: > >> >>>>>>>>> Failing > >> >>>>>>>>> over to rm2 > >> >>>>>>>>>> /app-logs/hdfs/logs/application_1427882795362_0070does not > >>have > >> >>>>> any > >> >>>>>>>>> log > >> >>>>>>>>> files. > >> >>>>>>>>>> > >> >>>>>>>>>> Can you login to the cluster node and look at the logs > >>directory > >> >>>>>>> (e.g. > >> >>>>>>>>> in HDP install it would be under /hadoop/yarn/logs IIRC)? > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> @Gour, Please find the attachment. > >> >>>>>>>>>> > >> >>>>>>>>>> On Tue, Apr 7, 2015 at 10:57 PM, Gour Saha > >> >>>>> <[email protected] > >> >>>>>>>>> <mailto:[email protected]>> wrote: > >> >>>>>>>>>> Can you take a screenshot of your RM UI and send it over? It > >>is > >> >>>>>>>>> usually > >> >>>>>>>>>> available in a URI similar to > >> >>>>>>>>> http://c6410.ambari.apache.org:8088/cluster. > >> >>>>>>>>>> I am specifically interested in seeing the Cluster Metrics > >> >>>>> table. > >> >>>>>>>>>> > >> >>>>>>>>>> -Gour > >> >>>>>>>>>> > >> >>>>>>>>>>> On 4/7/15, 10:17 AM, "Jon Maron" > >> >>>>> <[email protected]<mailto: > >> >>>>>>>>> [email protected]>> wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>>> On Apr 7, 2015, at 1:14 PM, Jon Maron > >> >>>>>>>>> <[email protected]<mailto: > >> >>>>>>>>> [email protected]>> wrote: > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>> On Apr 7, 2015, at 1:08 PM, Chackravarthy Esakkimuthu > >> >>>>>>>>>>>>> <[email protected]<mailto:[email protected]>> > >>wrote: > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Thanks for the reply guys! > >> >>>>>>>>>>>>> Contianer allocation happened successfully. > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> *RoleStatus{name='slider-appmaster', key=0, minimum=0, > >> >>>>>>> maximum=1, > >> >>>>>>>>>>>>> desired=1, actual=1,* > >> >>>>>>>>>>>>> *RoleStatus{name='STORM_UI_SERVER', key=2, minimum=0, > >> >>>>> maximum=1, > >> >>>>>>>>>>>>> desired=1, > >> >>>>>>>>>>>>> actual=1, * > >> >>>>>>>>>>>>> *RoleStatus{name='NIMBUS', key=1, minimum=0, maximum=1, > >> >>>>>>> desired=1, > >> >>>>>>>>>>>>> actual=1, * > >> >>>>>>>>>>>>> *RoleStatus{name='DRPC_SERVER', key=3, minimum=0, > >>maximum=1, > >> >>>>>>>>> desired=1, > >> >>>>>>>>>>>>> actual=1, * > >> >>>>>>>>>>>>> *RoleStatus{name='SUPERVISOR', key=4, minimum=0, > >>maximum=1, > >> >>>>>>>>> desired=1, > >> >>>>>>>>>>>>> actual=1,* > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Also, have put some logs specific to a container.. > >>(nimbus) > >> >>>>> Same > >> >>>>>>>>> set > >> >>>>>>>>> of > >> >>>>>>>>>>>>> logs available for other Roles also (except Supervisor > >>which > >> >>>>> has > >> >>>>>>>>> only > >> >>>>>>>>>>>>> first > >> >>>>>>>>>>>>> 2 lines of below logs) > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> *Installing NIMBUS on > >> >>>>>>> container_e04_1427882795362_0070_01_000002.* > >> >>>>>>>>>>>>> *Starting NIMBUS on > >> >>>>> container_e04_1427882795362_0070_01_000002.* > >> >>>>>>>>>>>>> *Registering component > >> >>>>>>> container_e04_1427882795362_0070_01_000002* > >> >>>>>>>>>>>>> *Requesting applied config for NIMBUS on > >> >>>>>>>>>>>>> container_e04_1427882795362_0070_01_000002.* > >> >>>>>>>>>>>>> *Received and processed config for > >> >>>>>>>>>>>>> container_e04_1427882795362_0070_01_000002___NIMBUS* > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Does this result in any intermediate state? > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> @Maron, I didn't configure any port specifically.. do I > >>need > >> >>>>> to > >> >>>>>>> to? > >> >>>>>>>>>>>>> Also, i > >> >>>>>>>>>>>>> don't see any error msg in AM logs wrt port conflict. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> My only concern was whether you were actually accession the > >> >>>>> web > >> >>>>>>> UIs > >> >>>>>>>>> at > >> >>>>>>>>>>>> the correct host and port. If you are then the next step > >>is > >> >>>>>>>>> probably > >> >>>>>>>>> to > >> >>>>>>>>>>>> look at the actual storm/hbase logs. you can use the ³yarn > >> >>>>> logs > >> >>>>>>>>>>>> -applicationid ..² command. > >> >>>>>>>>>>> > >> >>>>>>>>>>> *accessing* ;) > >> >>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> Thanks, > >> >>>>>>>>>>>>> Chackra > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>> On Tue, Apr 7, 2015 at 9:02 PM, Jon Maron > >> >>>>>>> <[email protected] > >> >>>>>>>>> <mailto:[email protected]>> > >> >>>>>>>>>>>>> wrote: > >> >>>>>>>>>>>>> > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> On Apr 7, 2015, at 11:03 AM, Billie Rinaldi > >> >>>>>>>>>>>>>>> > >><[email protected]<mailto:[email protected] > >> >>>>> > >> >>>>>>>>>>>>>> wrote: > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> One thing you can check is whether your system has > >>enough > >> >>>>>>>>> resources > >> >>>>>>>>>>>>>>> to > >> >>>>>>>>>>>>>>> allocate all the containers the app needs. You will see > >> >>>>> info > >> >>>>>>>>> like > >> >>>>>>>>>>>>>>> the > >> >>>>>>>>>>>>>>> following in the AM log (it will be logged multiple > >>times > >> >>>>> over > >> >>>>>>>>> the > >> >>>>>>>>>>>>>>> life > >> >>>>>>>>>>>>>> of > >> >>>>>>>>>>>>>>> the AM). In this case, the master I requested was > >> >>>>> allocated > >> >>>>>>> but > >> >>>>>>>>> the > >> >>>>>>>>>>>>>>> tservers were not. > >> >>>>>>>>>>>>>>> RoleStatus{name='ACCUMULO_TSERVER', key=2, desired=2, > >> >>>>>>> actual=0, > >> >>>>>>>>>>>>>>> requested=2, releasing=0, failed=0, started=0, > >> >>>>> startFailed=0, > >> >>>>>>>>>>>>>> completed=0, > >> >>>>>>>>>>>>>>> failureMessage=''} > >> >>>>>>>>>>>>>>> RoleStatus{name='ACCUMULO_MASTER', key=1, desired=1, > >> >>>>> actual=1, > >> >>>>>>>>>>>>>> requested=0, > >> >>>>>>>>>>>>>>> releasing=0, failed=0, started=0, startFailed=0, > >> >>>>> completed=0, > >> >>>>>>>>>>>>>>> failureMessage=Œ'} > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> You can also check the ³Scheduler² link on the RM Web UI > >>to > >> >>>>>>> get a > >> >>>>>>>>>>>>>> sense of > >> >>>>>>>>>>>>>> whether you are resource constrained. > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>> Are you certain that you are attempting to invoke the > >> >>>>> correct > >> >>>>>>>>> port? > >> >>>>>>>>>>>>>> The > >> >>>>>>>>>>>>>> listening ports are dynamically allocated by Slider. > >> >>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>> On Tue, Apr 7, 2015 at 3:29 AM, Chackravarthy > >>Esakkimuthu < > >> >>>>>>>>>>>>>>> [email protected]<mailto:[email protected]>> > >> >>> wrote: > >> >>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> Hi All, > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> I am new to Apache slider and would like to contribute. > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> Just to start with, I am trying out running "storm" and > >> >>>>>>>>> "hbase" on > >> >>>>>>>>>>>>>>>> yarn > >> >>>>>>>>>>>>>>>> using slider following the guide : > >> >>>>> > >> >>> > >> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/run > >> >>>>>>>>>>>>>> ning_applications_on_slider/index.html#Item1.1 > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> In both (storm and hbase) the cases, the > >>ApplicationMaster > >> >>>>>>> gets > >> >>>>>>>>>>>>>>>> launched > >> >>>>>>>>>>>>>>>> and still running, but the ApplicationMaster link not > >> >>>>>>> working, > >> >>>>>>>>> and > >> >>>>>>>>>>>>>>>> from > >> >>>>>>>>>>>>>> AM > >> >>>>>>>>>>>>>>>> logs, I don't see any errors. > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> How do I debug from this? Please help me. > >> >>>>>>>>>>>>>>>> Incase if there is any other mail thread with respect > >> >>>>> this, > >> >>>>>>>>> please > >> >>>>>>>>>>>>>>>> point > >> >>>>>>>>>>>>>>>> out to me. Thanks in advance. > >> >>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>> Thanks, > >> >>>>>>>>>>>>>>>> Chackra > >> >>> > >> >>> > >> > >> > >
