Re: Need help in starting storm on yarn using slider

Jon Maron Thu, 09 Apr 2015 06:32:28 -0700

Is the registry enabled?  Look for the following properties:


  1.  hadoop.registry.zk.quorum
  2.  hadoop.registry.rm.enabled=true

IIRC, both should be set in yarn-site.xml

On Apr 9, 2015, at 7:51 AM, Chackravarthy Esakkimuthu 
<[email protected]<mailto:[email protected]>> wrote:

Gour,

Yes there is a progress :) I disabled RM HA and then installed the same
storm cluster. Now, AM link works and provides the info.
But still I could not come to the conclusion whether installation of storm
successful or not. Because I tried to get the status of  installed cluster
using storm-slider client which failed as follows :

/usr/hdp/2.2.0.0-2041/storm-slider-client/bin/storm-slider --app *storm2*
quicklinks
*2015-04-09 17:14:42,914 [main] INFO  impl.TimelineClientImpl - Timeline
service address: http://host2:8188/ws/v1/timeline/
<http://host2:8188/ws/v1/timeline/>*
*2015-04-09 17:14:43,506 [main] WARN  shortcircuit.DomainSocketFactory -
The short-circuit local reads feature cannot be used because libhadoop
cannot be loaded.*
*2015-04-09 17:14:43,513 [main] INFO  client.RMProxy - Connecting to
ResourceManager at host2/xx.xx.xxx.xxx:8050*
*2015-04-09 17:14:43,608 [main] INFO  imps.CuratorFrameworkImpl - Starting*
*2015-04-09 17:14:43,639 [main-EventThread] INFO
state.ConnectionStateManager - State change: CONNECTED*
*2015-04-09 17:14:43,639 [ConnectionStateManager-0] WARN
state.ConnectionStateManager - There are no ConnectionStateListeners
registered.*
*2015-04-09 17:14:44,667 [main] ERROR main.ServiceLauncher -
/registry/users/chackaravarthy.e/services/org-apache-slider/storm2*
*2015-04-09 17:14:44,671 [main] INFO  util.ExitUtil - Exiting with status
44*
*Failed to read slider deployed storm config*

Thanks,
Chackra

On Wed, Apr 8, 2015 at 11:16 PM, Chackravarthy Esakkimuthu <
[email protected]<mailto:[email protected]>> wrote:

sure Gour, I will check with RM HA disabled and then will get in touch
with you to test with workaround enabling RM HA.
Thanks a lot!!

On Wed, Apr 8, 2015 at 10:59 PM, Gour Saha 
<[email protected]<mailto:[email protected]>> wrote:

Chackra,

We believe you are running into redirection issue when RM HA is setup -
https://issues.apache.org/jira/browse/YARN-1525

https://issues.apache.org/jira/browse/YARN-1811


These were fixed in Hadoop 2.6 (the version you have). But we still found
issues with Slider AM UI in Slider version 0.60 (the version you are
using) on top of Hadoop 2.6.


I thought we filed a JIRA on it, but could not find any. I went ahead and
filed one now -
https://issues.apache.org/jira/browse/SLIDER-846



Workaround -
Is this a production cluster? If not, can you disable RM HA and check if
you can access the AM UI and also run all slider command lines
successfully? This is a basic test to make ensure that this is indeed
happening because of RM HA setup.

Once we verify the above revert back to RM HA again. I think we can make
the Slider AM UI work in the RM HA setup by doing this (we haven’t tested
this so not 100% sure it will work) -

In the RM HA setup we can use YARN labels and constrain the Slider AM to
come up in the active RM node. Let me know if you want to try this route
and I would be happy to help you out with details on how to set this up.


-Gour

On 4/8/15, 9:17 AM, "Chackravarthy Esakkimuthu" <[email protected]>
wrote:

No, iptables is not enabled i think. (will confirm)
But, AM is running, even other containers are running and I could see
storm/hbase daemons running in those nodes.
Does this mean installation is successful? How do I check the status of
the
installation?

Tried using slider command with no success, (Please let me know if am I
using it wrongly)
- storm-yarn-1 and hb1 are the names which I used to for "slider create"
command.

/usr/hdp/current/slider-client/bin/./slider status *storm-yarn-1*
2015-04-08 21:40:17,178 [main] INFO  impl.TimelineClientImpl - Timeline
service address: http://host2:8188/ws/v1/timeline/
2015-04-08 21:40:17,782 [main] WARN  shortcircuit.DomainSocketFactory -
The
short-circuit local reads feature cannot be used because libhadoop cannot
be loaded.
2015-04-08 21:40:17,936 [main] INFO
client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
2015-04-08 21:40:17,970 [main] ERROR main.ServiceLauncher - *Unknown
application instance : storm-yarn-1*
2015-04-08 21:40:17,971 [main] INFO  util.ExitUtil - Exiting with status
69

/usr/hdp/current/slider-client/bin/./slider status *hb1*
2015-04-08 21:40:31,344 [main] INFO  impl.TimelineClientImpl - Timeline
service address: http://host2:8188/ws/v1/timeline/
2015-04-08 21:40:32,075 [main] WARN  shortcircuit.DomainSocketFactory -
The
short-circuit local reads feature cannot be used because libhadoop cannot
be loaded.
2015-04-08 21:40:32,263 [main] INFO
client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
2015-04-08 21:40:32,306 [main] ERROR main.ServiceLauncher - *Unknown
application instance : hb1*
2015-04-08 21:40:32,308 [main] INFO  util.ExitUtil - Exiting with status
69


On Wed, Apr 8, 2015 at 7:14 PM, Jon Maron <[email protected]>
wrote:

Indications seem to be that the AM is started but the AM URI you’re
attempting to attach to may be mistaken or there may be something
preventing the actual connection.  Any chance iptables is enabled?


On Apr 8, 2015, at 3:44 AM, Gour Saha <[email protected]> wrote:

Jon was right. I think Storm uses ${USER_NAME} for app_user instead
of
hard coding as yarn unlike hbase. So either users were fine.

One thing I saw in the AM and RM urls is that they link to
zs-aaa-001.nm.flipkart.com and zs-exp-01.nm.flipkart.com. Can you hand
edit the AM URL to try both the host aliases?

I am not sure if the above will work in which case if you could send
the
entire AM logs then it would be great.

-Gour

- Sent from my iPhone

On Apr 7, 2015, at 11:08 PM, "Chackravarthy Esakkimuthu" <
[email protected]> wrote:

Tried running with 'yarn' user, but it remains in same state.
AM link not working, and AM logs are similar.

On Wed, Apr 8, 2015 at 2:14 AM, Gour Saha <[email protected]>
wrote:

In a non-secured cluster you should run as yarn. Can you do that
and
let
us know how it goes?

Also you can stop your existing storm instance in hdfs user (run as
hdfs
user) by running stop first -
slider stop storm1

-Gour

On 4/7/15, 1:39 PM, "Chackravarthy Esakkimuthu"
<[email protected]

wrote:

This is not a secured cluster.
And yes, I used 'hdfs' user while running slider create.

On Wed, Apr 8, 2015 at 2:03 AM, Gour Saha <[email protected]

wrote:

Which user are you running the slider create command as? Seems
like
you
are running as hdfs user. Is this a secured cluster?

-Gour

On 4/7/15, 1:06 PM, "Chackravarthy Esakkimuthu" <
[email protected]>
wrote:

yes, RM HA has been setup in this cluster.

Active : zs-aaa-001.nm.flipkart.com
Standby : zs-aaa-002.nm.flipkart.com

RM Link :
http://zs-aaa-001.nm.flipkart.com:8088/cluster/scheduler
<http://zs-exp-01.nm.flipkart.com:8088/cluster/scheduler>

AM Link :



http://zs-aaa-001.nm.flipkart.com:8088/proxy/application_1427882795362_00
7
0/slideram
<



http://zs-exp-01.nm.flipkart.com:8088/proxy/application_1427882795362_007
0/slideram>

On Wed, Apr 8, 2015 at 1:05 AM, Gour Saha
<[email protected]>
wrote:

Sorry forgot that the AM link not working was the original
issue.

Few more things -
- Seems like you have RM HA setup, right?
- Can you copy paste the complete link of the RM UI and the URL
of
ApplicationMaster (the link which is broken) with actual
hostnames?


-Gour

On 4/7/15, 11:43 AM, "Chackravarthy Esakkimuthu"
<[email protected]

wrote:

Since 5 containers are running, which means that Storm daemons
are
already
up and running?


Actually the ApplicationMaster link is not working. It just
blanks
out
printing the following :

This is standby RM. Redirecting to the current active RM:
http://
<host-name>:8088/proxy/application_1427882795362_0070/slideram


And for resources.json, I dint make any change and used the
copy
of
resources-default.json as follows:


{

"schema" : "http://example.org/specification/v2.0.0";,

"metadata" : {

},

"global" : {

"yarn.log.include.patterns": "",

"yarn.log.exclude.patterns": ""

},

"components": {

"slider-appmaster": {

  "yarn.memory": "512"

},

"NIMBUS": {

  "yarn.role.priority": "1",

  "yarn.component.instances": "1",

  "yarn.memory": "2048"

},

"STORM_UI_SERVER": {

  "yarn.role.priority": "2",

  "yarn.component.instances": "1",

  "yarn.memory": "1278"

},

"DRPC_SERVER": {

  "yarn.role.priority": "3",

  "yarn.component.instances": "1",

  "yarn.memory": "1278"

},

"SUPERVISOR": {

  "yarn.role.priority": "4",

  "yarn.component.instances": "1",

  "yarn.memory": "3072"

}

}

}



On Tue, Apr 7, 2015 at 11:52 PM, Gour Saha <
[email protected]>
wrote:

Chackra sent the attachment directly to me. From what I see
the
cluster
resources (memory and cores) are abundant.

But I also see that only 1 app is running which is the one we
are
trying
to debug and 5 containers are running. So definitely more
containers
that
just the AM is running.

Can you click on the app master link and copy paste the
content
of
that
page? No need for screen shot. Also please send your
resources
JSON
file.

-Gour

- Sent from my iPhone

On Apr 7, 2015, at 11:01 AM, "Jon Maron"
<[email protected]>
wrote:


On Apr 7, 2015, at 1:36 PM, Chackravarthy Esakkimuthu <
[email protected]<mailto:[email protected]>> wrote:

@Maron, I could not get the logs even though the application
is
still
running.
It's a 10 node cluster and I logged into one of the node and
executed
the command :

sudo -u hdfs yarn logs -applicationId
application_1427882795362_0070
15/04/07 22:56:09 INFO impl.TimelineClientImpl: Timeline
service
address: http://$HOST:PORT/ws/v1/timeline/
15/04/07 22:56:09 INFO
client.ConfiguredRMFailoverProxyProvider:
Failing
over to rm2
/app-logs/hdfs/logs/application_1427882795362_0070does not
have
any
log
files.

Can you login to the cluster node and look at the logs
directory
(e.g.
in HDP install it would be under /hadoop/yarn/logs IIRC)?



@Gour, Please find the attachment.

On Tue, Apr 7, 2015 at 10:57 PM, Gour Saha
<[email protected]
<mailto:[email protected]>> wrote:
Can you take a screenshot of your RM UI and send it over? It
is
usually
available in a URI similar to
http://c6410.ambari.apache.org:8088/cluster.
I am specifically interested in seeing the Cluster Metrics
table.

-Gour

On 4/7/15, 10:17 AM, "Jon Maron"
<[email protected]<mailto:
[email protected]>> wrote:


On Apr 7, 2015, at 1:14 PM, Jon Maron
<[email protected]<mailto:
[email protected]>> wrote:


On Apr 7, 2015, at 1:08 PM, Chackravarthy Esakkimuthu
<[email protected]<mailto:[email protected]>>
wrote:

Thanks for the reply guys!
Contianer allocation happened successfully.

*RoleStatus{name='slider-appmaster', key=0, minimum=0,
maximum=1,
desired=1, actual=1,*
*RoleStatus{name='STORM_UI_SERVER', key=2, minimum=0,
maximum=1,
desired=1,
actual=1, *
*RoleStatus{name='NIMBUS', key=1, minimum=0, maximum=1,
desired=1,
actual=1, *
*RoleStatus{name='DRPC_SERVER', key=3, minimum=0,
maximum=1,
desired=1,
actual=1, *
*RoleStatus{name='SUPERVISOR', key=4, minimum=0,
maximum=1,
desired=1,
actual=1,*

Also, have put some logs specific to a container..
(nimbus)
Same
set
of
logs available for other Roles also (except Supervisor
which
has
only
first
2 lines of below logs)

*Installing NIMBUS on
container_e04_1427882795362_0070_01_000002.*
*Starting NIMBUS on
container_e04_1427882795362_0070_01_000002.*
*Registering component
container_e04_1427882795362_0070_01_000002*
*Requesting applied config for NIMBUS on
container_e04_1427882795362_0070_01_000002.*
*Received and processed config for
container_e04_1427882795362_0070_01_000002___NIMBUS*

Does this result in any intermediate state?

@Maron, I didn't configure any port specifically.. do I
need
to
to?
Also, i
don't see any error msg in AM logs wrt port conflict.

My only concern was whether you were actually accession
the
web
UIs
at
the correct host and port.  If you are then the next step
is
probably
to
look at the actual storm/hbase logs.  you can use the
³yarn
logs
-applicationid ..² command.

*accessing* ;)



Thanks,
Chackra



On Tue, Apr 7, 2015 at 9:02 PM, Jon Maron
<[email protected]
<mailto:[email protected]>>
wrote:


On Apr 7, 2015, at 11:03 AM, Billie Rinaldi

<[email protected]<mailto:[email protected]

wrote:

One thing you can check is whether your system has
enough
resources
to
allocate all the containers the app needs.  You will
see
info
like
the
following in the AM log (it will be logged multiple
times
over
the
life
of
the AM).  In this case, the master I requested was
allocated
but
the
tservers were not.
RoleStatus{name='ACCUMULO_TSERVER', key=2, desired=2,
actual=0,
requested=2, releasing=0, failed=0, started=0,
startFailed=0,
completed=0,
failureMessage=''}
RoleStatus{name='ACCUMULO_MASTER', key=1, desired=1,
actual=1,
requested=0,
releasing=0, failed=0, started=0, startFailed=0,
completed=0,
failureMessage=Œ'}

You can also check the ³Scheduler² link on the RM Web UI
to
get a
sense of
whether you are resource constrained.

Are you certain that you are attempting to invoke the
correct
port?
The
listening ports are dynamically allocated by Slider.



On Tue, Apr 7, 2015 at 3:29 AM, Chackravarthy
Esakkimuthu <
[email protected]<mailto:[email protected]>>
wrote:

Hi All,

I am new to Apache slider and would like to
contribute.

Just to start with, I am trying out running "storm"
and
"hbase" on
yarn
using slider following the guide :



http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/run
ning_applications_on_slider/index.html#Item1.1

In both (storm and hbase) the cases, the
ApplicationMaster
gets
launched
and still running, but the ApplicationMaster link not
working,
and
from
AM
logs, I don't see any errors.

How do I debug from this? Please help me.
Incase if there is any other mail thread with respect
this,
please
point
out to me. Thanks in advance.

Thanks,
Chackra

Re: Need help in starting storm on yarn using slider

Reply via email to