Jon was right. I think Storm uses ${USER_NAME} for app_user instead of hard
coding as yarn unlike hbase. So either users were fine.
One thing I saw in the AM and RM urls is that they link to
zs-aaa-001.nm.flipkart.com and zs-exp-01.nm.flipkart.com. Can you hand edit the
AM URL to try both the host aliases?
I am not sure if the above will work in which case if you could send the entire
AM logs then it would be great.
-Gour
- Sent from my iPhone
> On Apr 7, 2015, at 11:08 PM, "Chackravarthy Esakkimuthu"
> <[email protected]> wrote:
>
> Tried running with 'yarn' user, but it remains in same state.
> AM link not working, and AM logs are similar.
>
> On Wed, Apr 8, 2015 at 2:14 AM, Gour Saha <[email protected]> wrote:
>
>> In a non-secured cluster you should run as yarn. Can you do that and let
>> us know how it goes?
>>
>> Also you can stop your existing storm instance in hdfs user (run as hdfs
>> user) by running stop first -
>> slider stop storm1
>>
>> -Gour
>>
>> On 4/7/15, 1:39 PM, "Chackravarthy Esakkimuthu" <[email protected]>
>> wrote:
>>
>>> This is not a secured cluster.
>>> And yes, I used 'hdfs' user while running slider create.
>>>
>>>> On Wed, Apr 8, 2015 at 2:03 AM, Gour Saha <[email protected]> wrote:
>>>>
>>>> Which user are you running the slider create command as? Seems like you
>>>> are running as hdfs user. Is this a secured cluster?
>>>>
>>>> -Gour
>>>>
>>>> On 4/7/15, 1:06 PM, "Chackravarthy Esakkimuthu" <[email protected]>
>>>> wrote:
>>>>
>>>>> yes, RM HA has been setup in this cluster.
>>>>>
>>>>> Active : zs-aaa-001.nm.flipkart.com
>>>>> Standby : zs-aaa-002.nm.flipkart.com
>>>>>
>>>>> RM Link : http://zs-aaa-001.nm.flipkart.com:8088/cluster/scheduler
>>>>> <http://zs-exp-01.nm.flipkart.com:8088/cluster/scheduler>
>>>>>
>>>>> AM Link :
>> http://zs-aaa-001.nm.flipkart.com:8088/proxy/application_1427882795362_00
>>>> 7
>>>>> 0/slideram
>>>>> <
>> http://zs-exp-01.nm.flipkart.com:8088/proxy/application_1427882795362_007
>>>>> 0/slideram>
>>>>>
>>>>>> On Wed, Apr 8, 2015 at 1:05 AM, Gour Saha <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Sorry forgot that the AM link not working was the original issue.
>>>>>>
>>>>>> Few more things -
>>>>>> - Seems like you have RM HA setup, right?
>>>>>> - Can you copy paste the complete link of the RM UI and the URL of
>>>>>> ApplicationMaster (the link which is broken) with actual hostnames?
>>>>>>
>>>>>>
>>>>>> -Gour
>>>>>>
>>>>>> On 4/7/15, 11:43 AM, "Chackravarthy Esakkimuthu"
>>>> <[email protected]
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Since 5 containers are running, which means that Storm daemons are
>>>>>> already
>>>>>>> up and running?
>>>>>>>
>>>>>>>
>>>>>>> Actually the ApplicationMaster link is not working. It just blanks
>>>> out
>>>>>>> printing the following :
>>>>>>>
>>>>>>> This is standby RM. Redirecting to the current active RM:
>>>>> http://<host-name>:8088/proxy/application_1427882795362_0070/slideram
>>>>>>>
>>>>>>>
>>>>>>> And for resources.json, I dint make any change and used the copy of
>>>>>>> resources-default.json as follows:
>>>>>>>
>>>>>>>
>>>>>>> {
>>>>>>>
>>>>>>> "schema" : "http://example.org/specification/v2.0.0",
>>>>>>>
>>>>>>> "metadata" : {
>>>>>>>
>>>>>>> },
>>>>>>>
>>>>>>> "global" : {
>>>>>>>
>>>>>>> "yarn.log.include.patterns": "",
>>>>>>>
>>>>>>> "yarn.log.exclude.patterns": ""
>>>>>>>
>>>>>>> },
>>>>>>>
>>>>>>> "components": {
>>>>>>>
>>>>>>> "slider-appmaster": {
>>>>>>>
>>>>>>> "yarn.memory": "512"
>>>>>>>
>>>>>>> },
>>>>>>>
>>>>>>> "NIMBUS": {
>>>>>>>
>>>>>>> "yarn.role.priority": "1",
>>>>>>>
>>>>>>> "yarn.component.instances": "1",
>>>>>>>
>>>>>>> "yarn.memory": "2048"
>>>>>>>
>>>>>>> },
>>>>>>>
>>>>>>> "STORM_UI_SERVER": {
>>>>>>>
>>>>>>> "yarn.role.priority": "2",
>>>>>>>
>>>>>>> "yarn.component.instances": "1",
>>>>>>>
>>>>>>> "yarn.memory": "1278"
>>>>>>>
>>>>>>> },
>>>>>>>
>>>>>>> "DRPC_SERVER": {
>>>>>>>
>>>>>>> "yarn.role.priority": "3",
>>>>>>>
>>>>>>> "yarn.component.instances": "1",
>>>>>>>
>>>>>>> "yarn.memory": "1278"
>>>>>>>
>>>>>>> },
>>>>>>>
>>>>>>> "SUPERVISOR": {
>>>>>>>
>>>>>>> "yarn.role.priority": "4",
>>>>>>>
>>>>>>> "yarn.component.instances": "1",
>>>>>>>
>>>>>>> "yarn.memory": "3072"
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Tue, Apr 7, 2015 at 11:52 PM, Gour Saha <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Chackra sent the attachment directly to me. From what I see the
>>>>>> cluster
>>>>>>>> resources (memory and cores) are abundant.
>>>>>>>>
>>>>>>>> But I also see that only 1 app is running which is the one we are
>>>>>> trying
>>>>>>>> to debug and 5 containers are running. So definitely more
>>>> containers
>>>>>>>> that
>>>>>>>> just the AM is running.
>>>>>>>>
>>>>>>>> Can you click on the app master link and copy paste the content of
>>>>>> that
>>>>>>>> page? No need for screen shot. Also please send your resources
>>>> JSON
>>>>>>>> file.
>>>>>>>>
>>>>>>>> -Gour
>>>>>>>>
>>>>>>>> - Sent from my iPhone
>>>>>>>>
>>>>>>>>> On Apr 7, 2015, at 11:01 AM, "Jon Maron"
>>>> <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Apr 7, 2015, at 1:36 PM, Chackravarthy Esakkimuthu <
>>>>>>>>> [email protected]<mailto:[email protected]>> wrote:
>>>>>>>>>
>>>>>>>>> @Maron, I could not get the logs even though the application is
>>>>>> still
>>>>>>>> running.
>>>>>>>>> It's a 10 node cluster and I logged into one of the node and
>>>>>> executed
>>>>>>>> the command :
>>>>>>>>>
>>>>>>>>> sudo -u hdfs yarn logs -applicationId
>>>>>> application_1427882795362_0070
>>>>>>>>> 15/04/07 22:56:09 INFO impl.TimelineClientImpl: Timeline service
>>>>>>>> address: http://$HOST:PORT/ws/v1/timeline/
>>>>>>>>> 15/04/07 22:56:09 INFO client.ConfiguredRMFailoverProxyProvider:
>>>>>>>> Failing
>>>>>>>> over to rm2
>>>>>>>>> /app-logs/hdfs/logs/application_1427882795362_0070does not have
>>>> any
>>>>>>>> log
>>>>>>>> files.
>>>>>>>>>
>>>>>>>>> Can you login to the cluster node and look at the logs directory
>>>>>> (e.g.
>>>>>>>> in HDP install it would be under /hadoop/yarn/logs IIRC)?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> @Gour, Please find the attachment.
>>>>>>>>>
>>>>>>>>> On Tue, Apr 7, 2015 at 10:57 PM, Gour Saha
>>>> <[email protected]
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>> Can you take a screenshot of your RM UI and send it over? It is
>>>>>>>> usually
>>>>>>>>> available in a URI similar to
>>>>>>>> http://c6410.ambari.apache.org:8088/cluster.
>>>>>>>>> I am specifically interested in seeing the Cluster Metrics
>>>> table.
>>>>>>>>>
>>>>>>>>> -Gour
>>>>>>>>>
>>>>>>>>>> On 4/7/15, 10:17 AM, "Jon Maron"
>>>> <[email protected]<mailto:
>>>>>>>> [email protected]>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Apr 7, 2015, at 1:14 PM, Jon Maron
>>>>>>>> <[email protected]<mailto:
>>>>>>>> [email protected]>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On Apr 7, 2015, at 1:08 PM, Chackravarthy Esakkimuthu
>>>>>>>>>>>> <[email protected]<mailto:[email protected]>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the reply guys!
>>>>>>>>>>>> Contianer allocation happened successfully.
>>>>>>>>>>>>
>>>>>>>>>>>> *RoleStatus{name='slider-appmaster', key=0, minimum=0,
>>>>>> maximum=1,
>>>>>>>>>>>> desired=1, actual=1,*
>>>>>>>>>>>> *RoleStatus{name='STORM_UI_SERVER', key=2, minimum=0,
>>>> maximum=1,
>>>>>>>>>>>> desired=1,
>>>>>>>>>>>> actual=1, *
>>>>>>>>>>>> *RoleStatus{name='NIMBUS', key=1, minimum=0, maximum=1,
>>>>>> desired=1,
>>>>>>>>>>>> actual=1, *
>>>>>>>>>>>> *RoleStatus{name='DRPC_SERVER', key=3, minimum=0, maximum=1,
>>>>>>>> desired=1,
>>>>>>>>>>>> actual=1, *
>>>>>>>>>>>> *RoleStatus{name='SUPERVISOR', key=4, minimum=0, maximum=1,
>>>>>>>> desired=1,
>>>>>>>>>>>> actual=1,*
>>>>>>>>>>>>
>>>>>>>>>>>> Also, have put some logs specific to a container.. (nimbus)
>>>> Same
>>>>>>>> set
>>>>>>>> of
>>>>>>>>>>>> logs available for other Roles also (except Supervisor which
>>>> has
>>>>>>>> only
>>>>>>>>>>>> first
>>>>>>>>>>>> 2 lines of below logs)
>>>>>>>>>>>>
>>>>>>>>>>>> *Installing NIMBUS on
>>>>>> container_e04_1427882795362_0070_01_000002.*
>>>>>>>>>>>> *Starting NIMBUS on
>>>> container_e04_1427882795362_0070_01_000002.*
>>>>>>>>>>>> *Registering component
>>>>>> container_e04_1427882795362_0070_01_000002*
>>>>>>>>>>>> *Requesting applied config for NIMBUS on
>>>>>>>>>>>> container_e04_1427882795362_0070_01_000002.*
>>>>>>>>>>>> *Received and processed config for
>>>>>>>>>>>> container_e04_1427882795362_0070_01_000002___NIMBUS*
>>>>>>>>>>>>
>>>>>>>>>>>> Does this result in any intermediate state?
>>>>>>>>>>>>
>>>>>>>>>>>> @Maron, I didn't configure any port specifically.. do I need
>>>> to
>>>>>> to?
>>>>>>>>>>>> Also, i
>>>>>>>>>>>> don't see any error msg in AM logs wrt port conflict.
>>>>>>>>>>>
>>>>>>>>>>> My only concern was whether you were actually accession the
>>>> web
>>>>>> UIs
>>>>>>>> at
>>>>>>>>>>> the correct host and port. If you are then the next step is
>>>>>>>> probably
>>>>>>>> to
>>>>>>>>>>> look at the actual storm/hbase logs. you can use the ³yarn
>>>> logs
>>>>>>>>>>> -applicationid ..² command.
>>>>>>>>>>
>>>>>>>>>> *accessing* ;)
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Chackra
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 7, 2015 at 9:02 PM, Jon Maron
>>>>>> <[email protected]
>>>>>>>> <mailto:[email protected]>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Apr 7, 2015, at 11:03 AM, Billie Rinaldi
>>>>>>>>>>>>>> <[email protected]<mailto:[email protected]
>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One thing you can check is whether your system has enough
>>>>>>>> resources
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> allocate all the containers the app needs. You will see
>>>> info
>>>>>>>> like
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> following in the AM log (it will be logged multiple times
>>>> over
>>>>>>>> the
>>>>>>>>>>>>>> life
>>>>>>>>>>>>> of
>>>>>>>>>>>>>> the AM). In this case, the master I requested was
>>>> allocated
>>>>>> but
>>>>>>>> the
>>>>>>>>>>>>>> tservers were not.
>>>>>>>>>>>>>> RoleStatus{name='ACCUMULO_TSERVER', key=2, desired=2,
>>>>>> actual=0,
>>>>>>>>>>>>>> requested=2, releasing=0, failed=0, started=0,
>>>> startFailed=0,
>>>>>>>>>>>>> completed=0,
>>>>>>>>>>>>>> failureMessage=''}
>>>>>>>>>>>>>> RoleStatus{name='ACCUMULO_MASTER', key=1, desired=1,
>>>> actual=1,
>>>>>>>>>>>>> requested=0,
>>>>>>>>>>>>>> releasing=0, failed=0, started=0, startFailed=0,
>>>> completed=0,
>>>>>>>>>>>>>> failureMessage=Œ'}
>>>>>>>>>>>>>
>>>>>>>>>>>>> You can also check the ³Scheduler² link on the RM Web UI to
>>>>>> get a
>>>>>>>>>>>>> sense of
>>>>>>>>>>>>> whether you are resource constrained.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you certain that you are attempting to invoke the
>>>> correct
>>>>>>>> port?
>>>>>>>>>>>>> The
>>>>>>>>>>>>> listening ports are dynamically allocated by Slider.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Apr 7, 2015 at 3:29 AM, Chackravarthy Esakkimuthu <
>>>>>>>>>>>>>> [email protected]<mailto:[email protected]>>
>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am new to Apache slider and would like to contribute.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Just to start with, I am trying out running "storm" and
>>>>>>>> "hbase" on
>>>>>>>>>>>>>>> yarn
>>>>>>>>>>>>>>> using slider following the guide :
>>>>
>> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/run
>>>>>>>>>>>>> ning_applications_on_slider/index.html#Item1.1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In both (storm and hbase) the cases, the ApplicationMaster
>>>>>> gets
>>>>>>>>>>>>>>> launched
>>>>>>>>>>>>>>> and still running, but the ApplicationMaster link not
>>>>>> working,
>>>>>>>> and
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>> AM
>>>>>>>>>>>>>>> logs, I don't see any errors.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How do I debug from this? Please help me.
>>>>>>>>>>>>>>> Incase if there is any other mail thread with respect
>>>> this,
>>>>>>>> please
>>>>>>>>>>>>>>> point
>>>>>>>>>>>>>>> out to me. Thanks in advance.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Chackra
>>
>>