So I may be lookng at this wrong, but where is the data for the rm stored
if it does fail over? How will it know to pick up where it left off?  This
is just one area I am low in understanding on.

 That said, what about pre allocating a second failover rm some where on
the cluster.  (I am just tossing an idea here, in that there are probably
many reasons not to do this) but here is how I could see it happening.

1. Myriad starts a rm asking for 5 random available ports.  Mesos replies
starting the rm and reports to myriad the 5 ports used for the services you
listed below.

2. Myriad then checks a config value of number of "hot spares" lets say we
specify 1. Myriad then puts in a resource request to mesos for CPU and
memory required for the rm, but specifically asks for the same 5 ports
allocated to the first. Basically it reserves a spot on another node with
the same ports available. It may tak a bit, but there should be that
availability. Until this request is met, the yarn cluster is in a ha
compromised position.

3. At this point the perhaps we start another instance of rm right away
(depends on my first question on where the rm stores into about
jobs/applications) or the frame work just holds the spot, waiting for a
lack of heart beat (failover condition) on the primay resource manager.

4. If we can run the spare with no issues, it's a simple update of the dns
record and node managers connect to the new rm ( and another rm is
preallocated for redundancy). If we can't actually execute the secondary rm
until failover conditions, we can now execute the new rm, and the ports
will be the same.

This may seem kludgey at first, but done correctly, it may actually limit
the length of failover time as the rm is preallocated.  Rms are not huge
from a resource perspective thus it may be a small cost for those who want
failover and multiple clusters (thus having dynamic ports)

I will keep thinking this through, and would welcome feedback.

On Thursday, May 7, 2015, Santosh Marella <smare...@maprtech.com> wrote:

> Hi John,
>
>   Great views about extending mesos dns for rm's discovery. Some thoughts:
>    1. There are 5 primary interfaces RM exposes that are bound to standard
> ports.
>         a. RPC interface for clients that want to submit applications to
> YARN (port 8032).
>         b. RPC interface for NMs to connect back/HB to RM (port 8031).
>         c. RPC interface for App Masters to connect back/HB to RM (port
> 8030).
>         d. RPC interface for admin to interact with RM via CLI (port 8033).
>         e. Web Interface for RM's UI (port 8088).
>    2. When we launch RM using Marathon, it's probably better to mention in
> marathon's config that RM will use the above ports. This is because, if RM
> doesn't listens on random ports (as opposed to the above listed standard
> ports), when RM fails over, the new RM gets ports that might be different
> from the ones used by the old RM. This makes the RM's discovery hard,
> especially post failover.
>    3. It looks like what you are proposing is a way to update mesos-dns as
> to what ports RM's services are listening on. And when RM fails over, these
> ports would get updated in mesos-dns. Is my understanding correct? If yes,
> one challenge I see is that the clients that want to connect to the above
> listed RM interfaces also need to pull the changes to RM's port numbers
> from mesos-dns dynamically. Not sure how that might be possible.
>
>   Regarding your question about NM ports
>   1. NM has the following ports:
>       a. RPC port for app masters to launch containers (this is a random
> port).
>       b. RPC port for localization service. (port 8040)
>       c. Web port for NM's UI (port 8042).
>    2. Ports (a) and (c) are relayed to RM when NM registers with RM. Port
> (b) is passed to a local container executor process via command line args.
>    3. As you rightly reckon, we need a mechanism at launch of NM to pass
> the mesos allocated ports to NM for the above interfaces. We can try
> to use variable
> expansion
> <
> http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/conf/Configuration.html
> >
> mechanism hadoop has to achieve this.
>
> Thanks,
> Santosh
>
> On Thu, May 7, 2015 at 3:51 AM, John Omernik <j...@omernik.com
> <javascript:;>> wrote:
>
> > I've implemented mesos-dns and use marathon to launch my myriad
> framework.
> > It shows up as myriad.marahon.mesos and makes it easy to find what node
> the
> > framework launched the resource manager on.
> >
> >  What if we made myriad mesos-dns aware, and prior to launching the yarn
> > rm, it could register in mesos dns. This would mean both the ip addresses
> > and the ports (we need to figure out multiple ports in mesos-dns). Then
> it
> > could write out ports and host names in the nm configs by checking mesos
> > dns for which ports the resource manager is using.
>
>
> > Side question:  when a node manager registers with the resource manager
> > are the ports the nm is running on completely up to the nm? Ie I can run
> my
> > nm web server any port, Yarn just explains that to the rm on
> registration?
> > Because then we need a mechanism at launch of the nm task to understand
> > which ports mesos has allocated to the nm and update the yarn-site for
> that
> > nm before launch.... Perhaps mesos-dns as a requirement isn't needed,
> but I
> > am trying to walk through options that get us closer to multiple yarn
> > clusters on a mesos cluster.
> >
> > John
> >
> >
> > --
> > Sent from my iThing
> >
>


-- 
Sent from my iThing

Reply via email to