Re: Recommending or requiring mesos dns?

John Omernik Tue, 12 May 2015 05:43:06 -0700

The challenge I think is the ports. So we have 5 ports that are needed for
a RM, do we predefine those? I think Yuliya is saying yes, we should.  An
interesting compromise... rather than truly random ports,  when we define a
Yarn cluster, we have the responsibility to define out 5 "service" ports
using the Martahon/HA Proxy Service ports. (This now requires HA Proxy as
well as mesos-dns.  I'd recommend some work being done on documenting
HAProxy for use with the haproxy script, I know that I stumbled a bit
trying to get HAProxy setup, but that just may be my own lack of knowledge
on the subject) These ports will have to be available across the cluster,
and will map to whichever ports Mesos Assigns to the RM.


This makes sense to me, a "Yarn Cluster Creation" event on a Mesos cluster
is something we want to be flexible, but it's not something that will
likely be "self service". I.e. we won't have users just creating Yarn
clusters at will. It will likely be something that, when requested, the
Admin can identify 5 available service ports, and lock those into that
cluster... that way when the Yarn RM spins up, it has it's service ports
defined (and thus the Node managers always know which ports to connect to).
Combined with Mesos DNS, this could actually work out very well, as you can
the name of the RM can be hard coded, and the ports will just work no
matter which node it spins up.

>From an HA perspective, The only advantage at this point that preallocating
the failover RM is speed of recovery.  (and guarantee of resources being
available if failover occurs).  Perhaps we could consider this as an option
for those who need fast or guaranteed recovery but not make it a
requirement?

The service port method will not work however for the node manager ports.
That said, I "believe" that as myriad spins up a node manager, it can
dynamically allocate the ports, and thus report those to the resource
manager on registration. Someone may need to help me out on that one, as I
am not sure.  Also, since the node manager is host specific, mesos-dns is
not required, it can register to the resource manager with what ever ports
are allocated, and the hostname it's running on.  I guess the question here
is, when Myriad requests the resources, and mesos allocates the ports, can
myriad, prior to actually starting the node manager, update the configs
with the allocated ports?   Or is this even needed?

This is a great discussion.

On Mon, May 11, 2015 at 9:58 PM, yuliya Feldman <[email protected]
> wrote:

> As far as I understand in this case Apache YARN RM HA will kick in - which
> means all the ids, hosts, ports for all RMs will need to be defined
> somewhere and I wonder how it will be defined in this situation since those
> either need to be in yarn-site.xml or using "-D".
> In case of Mesos-DNS usage no need to setup RM HA at all and no warm
> standby needed. Marathon will start RM somewhere in case of failure and
> clients will rediscover it based on the same hostname.
> Am I missing anything?
>       From: Adam Bordelon <[email protected]>
>  To: [email protected]
>  Sent: Monday, May 11, 2015 7:26 PM
>  Subject: Re: Recommending or requiring mesos dns?
>
> I'm a +1 for random ports. You can also use Marathon's servicePort field to
> let HAProxy redirect from the servicePort to the actual hostPort for the
> service on each node. Mesos-DNS will similarly direct you to the correct
> host:port given the appropriate task name.
>
> Is there a reason we can't just have Marathon launch two RM tasks for the
> same YARN cluster? One would be the leader, and the other would redirect to
> it until failover. Once one fails over, the other will start taking
> traffic, and Marathon will try to launch a new backup RM when the resources
> are available. If the YARN RM cannot provide us this functionality on its
> own, perhaps we can write a simple wrapper script for it.
>
>
>
> On Fri, May 8, 2015 at 11:57 AM, John Omernik <[email protected]> wrote:
>
> > I would advocate random ports  because there should not be a limitation
> of
> > running only one RM per node.  If we want true portability, there should
> be
> > the ability to have RM for the cluster YarnProd to run to run on node1
> and
> > also have RM for the cluster YarnDev running on Node1. (if it so happens
> to
> > land this way).  That way the number of clusters isn't limited by the
> > number of physical nodes.
> >
> > On Fri, May 8, 2015 at 1:33 PM, Santosh Marella <[email protected]>
> > wrote:
> >
> > > RM can store its data either in HDFS or in ZooKeeper. The data store is
> > > configurable. There is a config property in YARN
> > > (yarn.resourcemanager.recovery.enabled) that tells RM whether it should
> > try
> > > to recover the metadata about the previously submitted apps, the
> > containers
> > > allocated to them etc from the state store.
> > >
> > > Pre allocation of a backup rm is a great idea. Thinking about it a bit
> > > more, I felt it might be better to have such an option available in
> > > Marathon rather than building it in Myriad (and in all
> > frameworks/services
> > > that wants HA/failover).
> > >
> > >  Let's say we launch a service X via marathon that requires some
> > resources
> > > (cpus/mem/ports) and we want 1 instance of that service to be always
> > > available. Marathon promises restart of the service if it goes down.
> But,
> > > as far as I understand, marathon can restart the service on another
> node
> > > only if the resources required by service X are available on that node
> > > *after* the service goes down. In other words, Marathon doesn't
> > proactively
> > > "reserve" these resources on another node as a backup for failover.
> > >
> > > Again, not all services launched via Marathon requires this, but
> perhaps
> > > there should be an config option to specify if a service desires to
> have
> > > marathon keep a backup node ready-to-go in the event of failure.
> > >
> > >
> > > On Thu, May 7, 2015 at 4:12 PM, John Omernik <[email protected]> wrote:
> > >
> > > > So I may be lookng at this wrong, but where is the data for the rm
> > stored
> > > > if it does fail over? How will it know to pick up where it left off?
> > This
> > >
> > > is just one area I am low in understanding on.
> > > >
> > > >
> > >
> > > >  That said, what about pre allocating a second failover rm some where
> > on
> > > > the cluster.  (I am just tossing an idea here, in that there are
> > probably
> > > > many reasons not to do this) but here is how I could see it
> happening.
> > > >
> > > 1. Myriad starts a rm asking for 5 random available ports.  Mesos
> replies
> > > > starting the rm and reports to myriad the 5 ports used for the
> services
> > > you
> > > > listed below.
> > > >
> > > > 2. Myriad then checks a config value of number of "hot spares" lets
> say
> > > we
> > > > specify 1. Myriad then puts in a resource request to mesos for CPU
> and
> > > > memory required for the rm, but specifically asks for the same 5
> ports
> > > > allocated to the first. Basically it reserves a spot on another node
> > with
> > > > the same ports available. It may tak a bit, but there should be that
> > > > availability. Until this request is met, the yarn cluster is in a ha
> > > > compromised position.
> > > >
> > >
> > >    This is exactly what I think we should do, but why use random ports
> > > instead of standard RM ports? If you have 10 slave nodes in your mesos
> > > cluster, then there are 10 potential spots for RM to be launched on.
> > > However, if you choose to launch multiple RMs (multiple YARN clusters),
> > > then you can probably launch utmost 5 (with remaining 5 nodes available
> > >
> > > >
> > > > 3. At this point the perhaps we start another instance of rm right
> away
> > > > (depends on my first question on where the rm stores into about
> > > > jobs/applications) or the frame work just holds the spot, waiting
> for a
> > > > lack of heart beat (failover condition) on the primay resource
> manager.
> > > >
> > > > 4. If we can run the spare with no issues, it's a simple update of
> the
> > > dns
> > > > record and node managers connect to the new rm ( and another rm is
> > > > preallocated for redundancy). If we can't actually execute the
> > secondary
> > > rm
> > > > until failover conditions, we can now execute the new rm, and the
> ports
> > > > will be the same.
> > > >
> > > > This may seem kludgey at first, but done correctly, it may actually
> > limit
> > > > the length of failover time as the rm is preallocated.  Rms are not
> > huge
> > > > from a resource perspective thus it may be a small cost for those who
> > > want
> > > > failover and multiple clusters (thus having dynamic ports)
> > > >
> > > > I will keep thinking this through, and would welcome feedback.
> > > >
> > > > On Thursday, May 7, 2015, Santosh Marella <[email protected]>
> > wrote:
> > > >
> > > > > Hi John,
> > > > >
> > > > >  Great views about extending mesos dns for rm's discovery. Some
> > > > thoughts:
> > > > >    1. There are 5 primary interfaces RM exposes that are bound to
> > > > standard
> > > > > ports.
> > > > >        a. RPC interface for clients that want to submit
> applications
> > > to
> > > > > YARN (port 8032).
> > > > >        b. RPC interface for NMs to connect back/HB to RM (port
> > 8031).
> > > > >        c. RPC interface for App Masters to connect back/HB to RM
> > (port
> > > > > 8030).
> > > > >        d. RPC interface for admin to interact with RM via CLI (port
> > > > 8033).
> > > > >        e. Web Interface for RM's UI (port 8088).
> > > > >    2. When we launch RM using Marathon, it's probably better to
> > mention
> > > > in
> > > > > marathon's config that RM will use the above ports. This is
> because,
> > if
> > > > RM
> > > > > doesn't listens on random ports (as opposed to the above listed
> > > standard
> > > > > ports), when RM fails over, the new RM gets ports that might be
> > > different
> > > > > from the ones used by the old RM. This makes the RM's discovery
> hard,
> > > > > especially post failover.
> > > > >    3. It looks like what you are proposing is a way to update
> > mesos-dns
> > > > as
> > > > > to what ports RM's services are listening on. And when RM fails
> over,
> > > > these
> > > > > ports would get updated in mesos-dns. Is my understanding correct?
> If
> > > > yes,
> > > > > one challenge I see is that the clients that want to connect to the
> > > above
> > > > > listed RM interfaces also need to pull the changes to RM's port
> > numbers
> > > > > from mesos-dns dynamically. Not sure how that might be possible.
> > > > >
> > > > >  Regarding your question about NM ports
> > > > >  1. NM has the following ports:
> > > > >      a. RPC port for app masters to launch containers (this is a
> > > random
> > > > > port).
> > > > >      b. RPC port for localization service. (port 8040)
> > > > >      c. Web port for NM's UI (port 8042).
> > > > >    2. Ports (a) and (c) are relayed to RM when NM registers with
> RM.
> > > Port
> > > > > (b) is passed to a local container executor process via command
> line
> > > > args.
> > > > >    3. As you rightly reckon, we need a mechanism at launch of NM to
> > > pass
> > > > > the mesos allocated ports to NM for the above interfaces. We can
> try
> > > > > to use variable
> > > > > expansion
> > > > > <
> > > > >
> > > >
> > >
> >
> http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/conf/Configuration.html
> > > > > >
> > > > > mechanism hadoop has to achieve this.
> > > > >
> > > > > Thanks,
> > > > > Santosh
> > > > >
> > > > > On Thu, May 7, 2015 at 3:51 AM, John Omernik <[email protected]
> > > > > <javascript:;>> wrote:
> > > > >
> > > > > > I've implemented mesos-dns and use marathon to launch my myriad
> > > > > framework.
> > > > > > It shows up as myriad.marahon.mesos and makes it easy to find
> what
> > > node
> > > > > the
> > > > > > framework launched the resource manager on.
> > > > > >
> > > > > >  What if we made myriad mesos-dns aware, and prior to launching
> the
> > > > yarn
> > > > > > rm, it could register in mesos dns. This would mean both the ip
> > > > addresses
> > > > > > and the ports (we need to figure out multiple ports in
> mesos-dns).
> > > Then
> > > > > it
> > > > > > could write out ports and host names in the nm configs by
> checking
> > > > mesos
> > > > > > dns for which ports the resource manager is using.
> > > > >
> > > > >
> > > > > > Side question:  when a node manager registers with the resource
> > > manager
> > > > > > are the ports the nm is running on completely up to the nm? Ie I
> > can
> > > > run
> > > > > my
> > > > > > nm web server any port, Yarn just explains that to the rm on
> > > > > registration?
> > > > > > Because then we need a mechanism at launch of the nm task to
> > > understand
> > > > > > which ports mesos has allocated to the nm and update the
> yarn-site
> > > for
> > > > > that
> > > > > > nm before launch.... Perhaps mesos-dns as a requirement isn't
> > needed,
> > > > > but I
> > > > > > am trying to walk through options that get us closer to multiple
> > yarn
> > > > > > clusters on a mesos cluster.
> > > > > >
> > > > > > John
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sent from my iThing
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sent from my iThing
> > > >
> > >
> >
>
>
>
>

Re: Recommending or requiring mesos dns?

Reply via email to