I might be missing something, but I didn't understand why mesos-dns would
be required in addition to HAProxy. If we configure RM to bind to random
ports, but have RM reachable via HAProxy on RM's service ports, won't all
the clients (such as NMs/HiveServer2 etc) just use HAProxy to reach to RM?
If yes, why is mesos-dns needed?

I have very limited knowledge about HAProxy configuration in a mesos
cluster. I just read through this doc:
https://docs.mesosphere.com/getting-started/service-discovery/ and what I
inferred is that a HAProxy instance runs on every slave node and if NM
running on a slave node has to reach to RM, it would simply use a RM's
address that looks like "localhost:99999" (where 99999 is a admin
identified RPC service port for RM).
Since HAProxy on NM's localhost listens on 99999, it just forwards the
traffic to RM's IP:RandomPort. Am I understanding this correctly?

Thanks,
Santosh

On Tue, May 12, 2015 at 5:41 AM, John Omernik <[email protected]> wrote:

> The challenge I think is the ports. So we have 5 ports that are needed for
> a RM, do we predefine those? I think Yuliya is saying yes, we should.  An
> interesting compromise... rather than truly random ports,  when we define a
> Yarn cluster, we have the responsibility to define out 5 "service" ports
> using the Martahon/HA Proxy Service ports. (This now requires HA Proxy as
> well as mesos-dns.

I'd recommend some work being done on documenting
> HAProxy for use with the haproxy script, I know that I stumbled a bit
> trying to get HAProxy setup, but that just may be my own lack of knowledge
> on the subject) These ports will have to be available across the cluster,
> and will map to whichever ports Mesos Assigns to the RM.
>
> This makes sense to me, a "Yarn Cluster Creation" event on a Mesos cluster
> is something we want to be flexible, but it's not something that will
> likely be "self service". I.e. we won't have users just creating Yarn
> clusters at will. It will likely be something that, when requested, the
> Admin can identify 5 available service ports, and lock those into that
> cluster... that way when the Yarn RM spins up, it has it's service ports
> defined (and thus the Node managers always know which ports to connect to).
> Combined with Mesos DNS, this could actually work out very well, as you can
> the name of the RM can be hard coded, and the ports will just work no
> matter which node it spins up.
>
> From an HA perspective, The only advantage at this point that preallocating
> the failover RM is speed of recovery.  (and guarantee of resources being
> available if failover occurs).  Perhaps we could consider this as an option
> for those who need fast or guaranteed recovery but not make it a
> requirement?
>
> The service port method will not work however for the node manager ports.
> That said, I "believe" that as myriad spins up a node manager, it can
> dynamically allocate the ports, and thus report those to the resource
> manager on registration. Someone may need to help me out on that one, as I
> am not sure.  Also, since the node manager is host specific, mesos-dns is
> not required, it can register to the resource manager with what ever ports
> are allocated, and the hostname it's running on.  I guess the question here
> is, when Myriad requests the resources, and mesos allocates the ports, can
> myriad, prior to actually starting the node manager, update the configs
> with the allocated ports?   Or is this even needed?
>
> This is a great discussion.
>
> On Mon, May 11, 2015 at 9:58 PM, yuliya Feldman
> <[email protected]
> > wrote:
>
> > As far as I understand in this case Apache YARN RM HA will kick in -
> which
> > means all the ids, hosts, ports for all RMs will need to be defined
> > somewhere and I wonder how it will be defined in this situation since
> those
> > either need to be in yarn-site.xml or using "-D".
> > In case of Mesos-DNS usage no need to setup RM HA at all and no warm
> > standby needed. Marathon will start RM somewhere in case of failure and
> > clients will rediscover it based on the same hostname.
> > Am I missing anything?
> >       From: Adam Bordelon <[email protected]>
> >  To: [email protected]
> >  Sent: Monday, May 11, 2015 7:26 PM
> >  Subject: Re: Recommending or requiring mesos dns?
> >
> > I'm a +1 for random ports. You can also use Marathon's servicePort field
> to
> > let HAProxy redirect from the servicePort to the actual hostPort for the
> > service on each node. Mesos-DNS will similarly direct you to the correct
> > host:port given the appropriate task name.
> >
> > Is there a reason we can't just have Marathon launch two RM tasks for the
> > same YARN cluster? One would be the leader, and the other would redirect
> to
> > it until failover. Once one fails over, the other will start taking
> > traffic, and Marathon will try to launch a new backup RM when the
> resources
> > are available. If the YARN RM cannot provide us this functionality on its
> > own, perhaps we can write a simple wrapper script for it.
> >
> >
> >
> > On Fri, May 8, 2015 at 11:57 AM, John Omernik <[email protected]> wrote:
> >
> > > I would advocate random ports  because there should not be a limitation
> > of
> > > running only one RM per node.  If we want true portability, there
> should
> > be
> > > the ability to have RM for the cluster YarnProd to run to run on node1
> > and
> > > also have RM for the cluster YarnDev running on Node1. (if it so
> happens
> > to
> > > land this way).  That way the number of clusters isn't limited by the
> > > number of physical nodes.
> > >
> > > On Fri, May 8, 2015 at 1:33 PM, Santosh Marella <[email protected]
> >
> > > wrote:
> > >
> > > > RM can store its data either in HDFS or in ZooKeeper. The data store
> is
> > > > configurable. There is a config property in YARN
> > > > (yarn.resourcemanager.recovery.enabled) that tells RM whether it
> should
> > > try
> > > > to recover the metadata about the previously submitted apps, the
> > > containers
> > > > allocated to them etc from the state store.
> > > >
> > > > Pre allocation of a backup rm is a great idea. Thinking about it a
> bit
> > > > more, I felt it might be better to have such an option available in
> > > > Marathon rather than building it in Myriad (and in all
> > > frameworks/services
> > > > that wants HA/failover).
> > > >
> > > >  Let's say we launch a service X via marathon that requires some
> > > resources
> > > > (cpus/mem/ports) and we want 1 instance of that service to be always
> > > > available. Marathon promises restart of the service if it goes down.
> > But,
> > > > as far as I understand, marathon can restart the service on another
> > node
> > > > only if the resources required by service X are available on that
> node
> > > > *after* the service goes down. In other words, Marathon doesn't
> > > proactively
> > > > "reserve" these resources on another node as a backup for failover.
> > > >
> > > > Again, not all services launched via Marathon requires this, but
> > perhaps
> > > > there should be an config option to specify if a service desires to
> > have
> > > > marathon keep a backup node ready-to-go in the event of failure.
> > > >
> > > >
> > > > On Thu, May 7, 2015 at 4:12 PM, John Omernik <[email protected]>
> wrote:
> > > >
> > > > > So I may be lookng at this wrong, but where is the data for the rm
> > > stored
> > > > > if it does fail over? How will it know to pick up where it left
> off?
> > > This
> > > >
> > > > is just one area I am low in understanding on.
> > > > >
> > > > >
> > > >
> > > > >  That said, what about pre allocating a second failover rm some
> where
> > > on
> > > > > the cluster.  (I am just tossing an idea here, in that there are
> > > probably
> > > > > many reasons not to do this) but here is how I could see it
> > happening.
> > > > >
> > > > 1. Myriad starts a rm asking for 5 random available ports.  Mesos
> > replies
> > > > > starting the rm and reports to myriad the 5 ports used for the
> > services
> > > > you
> > > > > listed below.
> > > > >
> > > > > 2. Myriad then checks a config value of number of "hot spares" lets
> > say
> > > > we
> > > > > specify 1. Myriad then puts in a resource request to mesos for CPU
> > and
> > > > > memory required for the rm, but specifically asks for the same 5
> > ports
> > > > > allocated to the first. Basically it reserves a spot on another
> node
> > > with
> > > > > the same ports available. It may tak a bit, but there should be
> that
> > > > > availability. Until this request is met, the yarn cluster is in a
> ha
> > > > > compromised position.
> > > > >
> > > >
> > > >    This is exactly what I think we should do, but why use random
> ports
> > > > instead of standard RM ports? If you have 10 slave nodes in your
> mesos
> > > > cluster, then there are 10 potential spots for RM to be launched on.
> > > > However, if you choose to launch multiple RMs (multiple YARN
> clusters),
> > > > then you can probably launch utmost 5 (with remaining 5 nodes
> available
> > > >
> > > > >
> > > > > 3. At this point the perhaps we start another instance of rm right
> > away
> > > > > (depends on my first question on where the rm stores into about
> > > > > jobs/applications) or the frame work just holds the spot, waiting
> > for a
> > > > > lack of heart beat (failover condition) on the primay resource
> > manager.
> > > > >
> > > > > 4. If we can run the spare with no issues, it's a simple update of
> > the
> > > > dns
> > > > > record and node managers connect to the new rm ( and another rm is
> > > > > preallocated for redundancy). If we can't actually execute the
> > > secondary
> > > > rm
> > > > > until failover conditions, we can now execute the new rm, and the
> > ports
> > > > > will be the same.
> > > > >
> > > > > This may seem kludgey at first, but done correctly, it may actually
> > > limit
> > > > > the length of failover time as the rm is preallocated.  Rms are not
> > > huge
> > > > > from a resource perspective thus it may be a small cost for those
> who
> > > > want
> > > > > failover and multiple clusters (thus having dynamic ports)
> > > > >
> > > > > I will keep thinking this through, and would welcome feedback.
> > > > >
> > > > > On Thursday, May 7, 2015, Santosh Marella <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi John,
> > > > > >
> > > > > >  Great views about extending mesos dns for rm's discovery. Some
> > > > > thoughts:
> > > > > >    1. There are 5 primary interfaces RM exposes that are bound to
> > > > > standard
> > > > > > ports.
> > > > > >        a. RPC interface for clients that want to submit
> > applications
> > > > to
> > > > > > YARN (port 8032).
> > > > > >        b. RPC interface for NMs to connect back/HB to RM (port
> > > 8031).
> > > > > >        c. RPC interface for App Masters to connect back/HB to RM
> > > (port
> > > > > > 8030).
> > > > > >        d. RPC interface for admin to interact with RM via CLI
> (port
> > > > > 8033).
> > > > > >        e. Web Interface for RM's UI (port 8088).
> > > > > >    2. When we launch RM using Marathon, it's probably better to
> > > mention
> > > > > in
> > > > > > marathon's config that RM will use the above ports. This is
> > because,
> > > if
> > > > > RM
> > > > > > doesn't listens on random ports (as opposed to the above listed
> > > > standard
> > > > > > ports), when RM fails over, the new RM gets ports that might be
> > > > different
> > > > > > from the ones used by the old RM. This makes the RM's discovery
> > hard,
> > > > > > especially post failover.
> > > > > >    3. It looks like what you are proposing is a way to update
> > > mesos-dns
> > > > > as
> > > > > > to what ports RM's services are listening on. And when RM fails
> > over,
> > > > > these
> > > > > > ports would get updated in mesos-dns. Is my understanding
> correct?
> > If
> > > > > yes,
> > > > > > one challenge I see is that the clients that want to connect to
> the
> > > > above
> > > > > > listed RM interfaces also need to pull the changes to RM's port
> > > numbers
> > > > > > from mesos-dns dynamically. Not sure how that might be possible.
> > > > > >
> > > > > >  Regarding your question about NM ports
> > > > > >  1. NM has the following ports:
> > > > > >      a. RPC port for app masters to launch containers (this is a
> > > > random
> > > > > > port).
> > > > > >      b. RPC port for localization service. (port 8040)
> > > > > >      c. Web port for NM's UI (port 8042).
> > > > > >    2. Ports (a) and (c) are relayed to RM when NM registers with
> > RM.
> > > > Port
> > > > > > (b) is passed to a local container executor process via command
> > line
> > > > > args.
> > > > > >    3. As you rightly reckon, we need a mechanism at launch of NM
> to
> > > > pass
> > > > > > the mesos allocated ports to NM for the above interfaces. We can
> > try
> > > > > > to use variable
> > > > > > expansion
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/conf/Configuration.html
> > > > > > >
> > > > > > mechanism hadoop has to achieve this.
> > > > > >
> > > > > > Thanks,
> > > > > > Santosh
> > > > > >
> > > > > > On Thu, May 7, 2015 at 3:51 AM, John Omernik <[email protected]
> > > > > > <javascript:;>> wrote:
> > > > > >
> > > > > > > I've implemented mesos-dns and use marathon to launch my myriad
> > > > > > framework.
> > > > > > > It shows up as myriad.marahon.mesos and makes it easy to find
> > what
> > > > node
> > > > > > the
> > > > > > > framework launched the resource manager on.
> > > > > > >
> > > > > > >  What if we made myriad mesos-dns aware, and prior to launching
> > the
> > > > > yarn
> > > > > > > rm, it could register in mesos dns. This would mean both the ip
> > > > > addresses
> > > > > > > and the ports (we need to figure out multiple ports in
> > mesos-dns).
> > > > Then
> > > > > > it
> > > > > > > could write out ports and host names in the nm configs by
> > checking
> > > > > mesos
> > > > > > > dns for which ports the resource manager is using.
> > > > > >
> > > > > >
> > > > > > > Side question:  when a node manager registers with the resource
> > > > manager
> > > > > > > are the ports the nm is running on completely up to the nm? Ie
> I
> > > can
> > > > > run
> > > > > > my
> > > > > > > nm web server any port, Yarn just explains that to the rm on
> > > > > > registration?
> > > > > > > Because then we need a mechanism at launch of the nm task to
> > > > understand
> > > > > > > which ports mesos has allocated to the nm and update the
> > yarn-site
> > > > for
> > > > > > that
> > > > > > > nm before launch.... Perhaps mesos-dns as a requirement isn't
> > > needed,
> > > > > > but I
> > > > > > > am trying to walk through options that get us closer to
> multiple
> > > yarn
> > > > > > > clusters on a mesos cluster.
> > > > > > >
> > > > > > > John
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sent from my iThing
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sent from my iThing
> > > > >
> > > >
> > >
> >
> >
> >
> >
>

Reply via email to