Zhongyue,

Glad to know it worked.

As per my understanding, Myriad does pass RM's IP to NM executor when NM
tasks gets launched through mesos, but for this, we need to configure RM's
address. Without configuring this, we may not know what address is assigned
to RM when it gets launched. Correct me if I am wrong or misunderstood your
question?

The way I tried myriad was using mesos-dns service to take care the RM
discovery and only thing I needed to do was specify the *.marathon.mesos
address to yarn-site.xml or through commandline for RM's address (in case
when RM is launched through Marathon). This helped with Myriad HA also if
RM is launched on some other node in the cluster due to failure and there
is no need for manual intervention to keep changing address in this case.

For second question, Can you get mesos task info if NM task is launched or
in active state from Mesos UI? Mesos also provide the sandbox link where NM
logs can be fetched, Same goes for RM logs as well, since this is also a
mesos task/executor running by Mesos-master. Or, mesos tasks logs resides
somewhere in /tmp/mesos/* dir for storing mesos  master/slave logs.

Any more info/logs on the issue will help triage this, as this can be issue
with configuration too. Anyways, ping us again if you run into any
non-obvious issue or need any help with something else.

-Sarjeet



On Tue, Sep 22, 2015 at 7:41 PM, Zhongyue Luo <[email protected]>
wrote:

> Thanks Sarjeet, it work.
>
> However, this seems very strange. Shouldn't the RM's IP be included in the
> task info so that the executor injects the IP when launching the NM?
>
> Also I can see that the defaule NM has been registered to the RM through
> the RM web ui but the task status is still "STAGING" from the Mesos web ui.
> Is this normal?
>
> On Tue, Sep 22, 2015 at 11:19 PM, Sarjeet Singh <[email protected]
> >
> wrote:
>
> > Zhongyue,
> >
> > You can specify RM's IP from commandline when starting RM, or you can set
> > the following property in yarn-site.xml:
> >
> > <property>
> >
> >     <name>yarn.resourcemanager.hostname</name>
> >
> >     <value>RM IP</value>
> >
> >   </property>
> >
> > OR
> >
> > From commandline,
> >
> > YARN_RESOURCEMANAGER_OPTS=-Dyarn.resourcemanager.hostname=<RM_IP> && yarn
> > resourcemanager
> >
> > ===========================
> >
> > Try the following and see it it works?
> >
> > -Sarjeet
> >
> > On Tue, Sep 22, 2015 at 1:04 AM, Zhongyue Luo <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > I've recently redeployed Myriad in our Mesos cluster.
> > >
> > > However, the node managers fail because they are trying to connect to a
> > > invalid Resource Manager IP.
> > >
> > > Below is a part of the log in one of the Mesos Agents that attemts to
> > > launch a Node manager.
> > >
> > > 15/09/22 15:41:52 INFO webapp.WebApps: Web app /node started at 8042
> > > 15/09/22 15:41:52 INFO webapp.WebApps: Registered webapp guice modules
> > > 15/09/22 15:41:52 INFO client.RMProxy: Connecting to ResourceManager
> at /
> > > 0.0.0.0:8031
> > > 15/09/22 15:41:52 INFO nodemanager.NodeStatusUpdaterImpl: Sending out 0
> > NM
> > > container statuses: []
> > > 15/09/22 15:41:52 INFO nodemanager.NodeStatusUpdaterImpl: Registering
> > with
> > > RM using containers :[]
> > > 15/09/22 15:41:54 INFO ipc.Client: Retrying connect to server:
> > > 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is
> > > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> > > MILLISECONDS)
> > > 15/09/22 15:41:55 INFO ipc.Client: Retrying connect to server:
> > > 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is
> > > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> > > MILLISECONDS)
> > > 15/09/22 15:41:56 INFO ipc.Client: Retrying connect to server:
> > > 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is
> > > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> > > MILLISECONDS)
> > > 15/09/22 15:41:57 INFO ipc.Client: Retrying connect to server:
> > > 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is
> > > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
> > > MILLISECONDS)
> > >
> > > You can see that it attempts to connect to 0.0.0.0:8031 when the
> active
> > > resource manager is located in a different location.
> > >
> > > I've followed the instructions here.
> > > https://github.com/mesos/myriad/blob/phase1/docs/myriad-dev.md
> > >
> > > Which configuration do I need to recheck to get this right?
> > >
> > > Thanks in advance.
> > >
> > > -zhongyue
> > >
> > > --
> > > *Intel SSG/STO/BDT*
> > > 880 Zixing Road, Zizhu Science Park, Minhang District, 200241,
> Shanghai,
> > > China
> > > +862161166500
> > >
> >
>
>
>
> --
> *Intel SSG/STO/BDT*
> 880 Zixing Road, Zizhu Science Park, Minhang District, 200241, Shanghai,
> China
> +862161166500
>

Reply via email to