Is the registry enabled? Look for the following properties:
1. hadoop.registry.zk.quorum 2. hadoop.registry.rm.enabled=true IIRC, both should be set in yarn-site.xml On Apr 9, 2015, at 7:51 AM, Chackravarthy Esakkimuthu <[email protected]<mailto:[email protected]>> wrote: Gour, Yes there is a progress :) I disabled RM HA and then installed the same storm cluster. Now, AM link works and provides the info. But still I could not come to the conclusion whether installation of storm successful or not. Because I tried to get the status of installed cluster using storm-slider client which failed as follows : /usr/hdp/2.2.0.0-2041/storm-slider-client/bin/storm-slider --app *storm2* quicklinks *2015-04-09 17:14:42,914 [main] INFO impl.TimelineClientImpl - Timeline service address: http://host2:8188/ws/v1/timeline/ <http://host2:8188/ws/v1/timeline/>* *2015-04-09 17:14:43,506 [main] WARN shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.* *2015-04-09 17:14:43,513 [main] INFO client.RMProxy - Connecting to ResourceManager at host2/xx.xx.xxx.xxx:8050* *2015-04-09 17:14:43,608 [main] INFO imps.CuratorFrameworkImpl - Starting* *2015-04-09 17:14:43,639 [main-EventThread] INFO state.ConnectionStateManager - State change: CONNECTED* *2015-04-09 17:14:43,639 [ConnectionStateManager-0] WARN state.ConnectionStateManager - There are no ConnectionStateListeners registered.* *2015-04-09 17:14:44,667 [main] ERROR main.ServiceLauncher - /registry/users/chackaravarthy.e/services/org-apache-slider/storm2* *2015-04-09 17:14:44,671 [main] INFO util.ExitUtil - Exiting with status 44* *Failed to read slider deployed storm config* Thanks, Chackra On Wed, Apr 8, 2015 at 11:16 PM, Chackravarthy Esakkimuthu < [email protected]<mailto:[email protected]>> wrote: sure Gour, I will check with RM HA disabled and then will get in touch with you to test with workaround enabling RM HA. Thanks a lot!! On Wed, Apr 8, 2015 at 10:59 PM, Gour Saha <[email protected]<mailto:[email protected]>> wrote: Chackra, We believe you are running into redirection issue when RM HA is setup - https://issues.apache.org/jira/browse/YARN-1525 https://issues.apache.org/jira/browse/YARN-1811 These were fixed in Hadoop 2.6 (the version you have). But we still found issues with Slider AM UI in Slider version 0.60 (the version you are using) on top of Hadoop 2.6. I thought we filed a JIRA on it, but could not find any. I went ahead and filed one now - https://issues.apache.org/jira/browse/SLIDER-846 Workaround - Is this a production cluster? If not, can you disable RM HA and check if you can access the AM UI and also run all slider command lines successfully? This is a basic test to make ensure that this is indeed happening because of RM HA setup. Once we verify the above revert back to RM HA again. I think we can make the Slider AM UI work in the RM HA setup by doing this (we haven’t tested this so not 100% sure it will work) - In the RM HA setup we can use YARN labels and constrain the Slider AM to come up in the active RM node. Let me know if you want to try this route and I would be happy to help you out with details on how to set this up. -Gour On 4/8/15, 9:17 AM, "Chackravarthy Esakkimuthu" <[email protected]> wrote: No, iptables is not enabled i think. (will confirm) But, AM is running, even other containers are running and I could see storm/hbase daemons running in those nodes. Does this mean installation is successful? How do I check the status of the installation? Tried using slider command with no success, (Please let me know if am I using it wrongly) - storm-yarn-1 and hb1 are the names which I used to for "slider create" command. /usr/hdp/current/slider-client/bin/./slider status *storm-yarn-1* 2015-04-08 21:40:17,178 [main] INFO impl.TimelineClientImpl - Timeline service address: http://host2:8188/ws/v1/timeline/ 2015-04-08 21:40:17,782 [main] WARN shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2015-04-08 21:40:17,936 [main] INFO client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 2015-04-08 21:40:17,970 [main] ERROR main.ServiceLauncher - *Unknown application instance : storm-yarn-1* 2015-04-08 21:40:17,971 [main] INFO util.ExitUtil - Exiting with status 69 /usr/hdp/current/slider-client/bin/./slider status *hb1* 2015-04-08 21:40:31,344 [main] INFO impl.TimelineClientImpl - Timeline service address: http://host2:8188/ws/v1/timeline/ 2015-04-08 21:40:32,075 [main] WARN shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2015-04-08 21:40:32,263 [main] INFO client.ConfiguredRMFailoverProxyProvider - Failing over to rm2 2015-04-08 21:40:32,306 [main] ERROR main.ServiceLauncher - *Unknown application instance : hb1* 2015-04-08 21:40:32,308 [main] INFO util.ExitUtil - Exiting with status 69 On Wed, Apr 8, 2015 at 7:14 PM, Jon Maron <[email protected]> wrote: Indications seem to be that the AM is started but the AM URI you’re attempting to attach to may be mistaken or there may be something preventing the actual connection. Any chance iptables is enabled? On Apr 8, 2015, at 3:44 AM, Gour Saha <[email protected]> wrote: Jon was right. I think Storm uses ${USER_NAME} for app_user instead of hard coding as yarn unlike hbase. So either users were fine. One thing I saw in the AM and RM urls is that they link to zs-aaa-001.nm.flipkart.com and zs-exp-01.nm.flipkart.com. Can you hand edit the AM URL to try both the host aliases? I am not sure if the above will work in which case if you could send the entire AM logs then it would be great. -Gour - Sent from my iPhone On Apr 7, 2015, at 11:08 PM, "Chackravarthy Esakkimuthu" < [email protected]> wrote: Tried running with 'yarn' user, but it remains in same state. AM link not working, and AM logs are similar. On Wed, Apr 8, 2015 at 2:14 AM, Gour Saha <[email protected]> wrote: In a non-secured cluster you should run as yarn. Can you do that and let us know how it goes? Also you can stop your existing storm instance in hdfs user (run as hdfs user) by running stop first - slider stop storm1 -Gour On 4/7/15, 1:39 PM, "Chackravarthy Esakkimuthu" <[email protected] wrote: This is not a secured cluster. And yes, I used 'hdfs' user while running slider create. On Wed, Apr 8, 2015 at 2:03 AM, Gour Saha <[email protected] wrote: Which user are you running the slider create command as? Seems like you are running as hdfs user. Is this a secured cluster? -Gour On 4/7/15, 1:06 PM, "Chackravarthy Esakkimuthu" < [email protected]> wrote: yes, RM HA has been setup in this cluster. Active : zs-aaa-001.nm.flipkart.com Standby : zs-aaa-002.nm.flipkart.com RM Link : http://zs-aaa-001.nm.flipkart.com:8088/cluster/scheduler <http://zs-exp-01.nm.flipkart.com:8088/cluster/scheduler> AM Link : http://zs-aaa-001.nm.flipkart.com:8088/proxy/application_1427882795362_00 7 0/slideram < http://zs-exp-01.nm.flipkart.com:8088/proxy/application_1427882795362_007 0/slideram> On Wed, Apr 8, 2015 at 1:05 AM, Gour Saha <[email protected]> wrote: Sorry forgot that the AM link not working was the original issue. Few more things - - Seems like you have RM HA setup, right? - Can you copy paste the complete link of the RM UI and the URL of ApplicationMaster (the link which is broken) with actual hostnames? -Gour On 4/7/15, 11:43 AM, "Chackravarthy Esakkimuthu" <[email protected] wrote: Since 5 containers are running, which means that Storm daemons are already up and running? Actually the ApplicationMaster link is not working. It just blanks out printing the following : This is standby RM. Redirecting to the current active RM: http:// <host-name>:8088/proxy/application_1427882795362_0070/slideram And for resources.json, I dint make any change and used the copy of resources-default.json as follows: { "schema" : "http://example.org/specification/v2.0.0", "metadata" : { }, "global" : { "yarn.log.include.patterns": "", "yarn.log.exclude.patterns": "" }, "components": { "slider-appmaster": { "yarn.memory": "512" }, "NIMBUS": { "yarn.role.priority": "1", "yarn.component.instances": "1", "yarn.memory": "2048" }, "STORM_UI_SERVER": { "yarn.role.priority": "2", "yarn.component.instances": "1", "yarn.memory": "1278" }, "DRPC_SERVER": { "yarn.role.priority": "3", "yarn.component.instances": "1", "yarn.memory": "1278" }, "SUPERVISOR": { "yarn.role.priority": "4", "yarn.component.instances": "1", "yarn.memory": "3072" } } } On Tue, Apr 7, 2015 at 11:52 PM, Gour Saha < [email protected]> wrote: Chackra sent the attachment directly to me. From what I see the cluster resources (memory and cores) are abundant. But I also see that only 1 app is running which is the one we are trying to debug and 5 containers are running. So definitely more containers that just the AM is running. Can you click on the app master link and copy paste the content of that page? No need for screen shot. Also please send your resources JSON file. -Gour - Sent from my iPhone On Apr 7, 2015, at 11:01 AM, "Jon Maron" <[email protected]> wrote: On Apr 7, 2015, at 1:36 PM, Chackravarthy Esakkimuthu < [email protected]<mailto:[email protected]>> wrote: @Maron, I could not get the logs even though the application is still running. It's a 10 node cluster and I logged into one of the node and executed the command : sudo -u hdfs yarn logs -applicationId application_1427882795362_0070 15/04/07 22:56:09 INFO impl.TimelineClientImpl: Timeline service address: http://$HOST:PORT/ws/v1/timeline/ 15/04/07 22:56:09 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 /app-logs/hdfs/logs/application_1427882795362_0070does not have any log files. Can you login to the cluster node and look at the logs directory (e.g. in HDP install it would be under /hadoop/yarn/logs IIRC)? @Gour, Please find the attachment. On Tue, Apr 7, 2015 at 10:57 PM, Gour Saha <[email protected] <mailto:[email protected]>> wrote: Can you take a screenshot of your RM UI and send it over? It is usually available in a URI similar to http://c6410.ambari.apache.org:8088/cluster. I am specifically interested in seeing the Cluster Metrics table. -Gour On 4/7/15, 10:17 AM, "Jon Maron" <[email protected]<mailto: [email protected]>> wrote: On Apr 7, 2015, at 1:14 PM, Jon Maron <[email protected]<mailto: [email protected]>> wrote: On Apr 7, 2015, at 1:08 PM, Chackravarthy Esakkimuthu <[email protected]<mailto:[email protected]>> wrote: Thanks for the reply guys! Contianer allocation happened successfully. *RoleStatus{name='slider-appmaster', key=0, minimum=0, maximum=1, desired=1, actual=1,* *RoleStatus{name='STORM_UI_SERVER', key=2, minimum=0, maximum=1, desired=1, actual=1, * *RoleStatus{name='NIMBUS', key=1, minimum=0, maximum=1, desired=1, actual=1, * *RoleStatus{name='DRPC_SERVER', key=3, minimum=0, maximum=1, desired=1, actual=1, * *RoleStatus{name='SUPERVISOR', key=4, minimum=0, maximum=1, desired=1, actual=1,* Also, have put some logs specific to a container.. (nimbus) Same set of logs available for other Roles also (except Supervisor which has only first 2 lines of below logs) *Installing NIMBUS on container_e04_1427882795362_0070_01_000002.* *Starting NIMBUS on container_e04_1427882795362_0070_01_000002.* *Registering component container_e04_1427882795362_0070_01_000002* *Requesting applied config for NIMBUS on container_e04_1427882795362_0070_01_000002.* *Received and processed config for container_e04_1427882795362_0070_01_000002___NIMBUS* Does this result in any intermediate state? @Maron, I didn't configure any port specifically.. do I need to to? Also, i don't see any error msg in AM logs wrt port conflict. My only concern was whether you were actually accession the web UIs at the correct host and port. If you are then the next step is probably to look at the actual storm/hbase logs. you can use the ³yarn logs -applicationid ..² command. *accessing* ;) Thanks, Chackra On Tue, Apr 7, 2015 at 9:02 PM, Jon Maron <[email protected] <mailto:[email protected]>> wrote: On Apr 7, 2015, at 11:03 AM, Billie Rinaldi <[email protected]<mailto:[email protected] wrote: One thing you can check is whether your system has enough resources to allocate all the containers the app needs. You will see info like the following in the AM log (it will be logged multiple times over the life of the AM). In this case, the master I requested was allocated but the tservers were not. RoleStatus{name='ACCUMULO_TSERVER', key=2, desired=2, actual=0, requested=2, releasing=0, failed=0, started=0, startFailed=0, completed=0, failureMessage=''} RoleStatus{name='ACCUMULO_MASTER', key=1, desired=1, actual=1, requested=0, releasing=0, failed=0, started=0, startFailed=0, completed=0, failureMessage=Œ'} You can also check the ³Scheduler² link on the RM Web UI to get a sense of whether you are resource constrained. Are you certain that you are attempting to invoke the correct port? The listening ports are dynamically allocated by Slider. On Tue, Apr 7, 2015 at 3:29 AM, Chackravarthy Esakkimuthu < [email protected]<mailto:[email protected]>> wrote: Hi All, I am new to Apache slider and would like to contribute. Just to start with, I am trying out running "storm" and "hbase" on yarn using slider following the guide : http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/run ning_applications_on_slider/index.html#Item1.1 In both (storm and hbase) the cases, the ApplicationMaster gets launched and still running, but the ApplicationMaster link not working, and from AM logs, I don't see any errors. How do I debug from this? Please help me. Incase if there is any other mail thread with respect this, please point out to me. Thanks in advance. Thanks, Chackra
