Re: Recommending or requiring mesos dns?

Swapnil Daingade Wed, 20 May 2015 17:44:24 -0700

Hi John,

Are you suggesting something like this ?


In issue 96 we are proposing something that will not require port mapping.
Can you take a look and give your thoughts
https://github.com/mesos/myriad/issues/96

Regards
Swapnil



On Fri, May 15, 2015 at 6:44 AM, John Omernik <[email protected]> wrote:

> This is true. In this setup thought, we wouldn't be using the "random
> ports" We'd be assigning the ports that will be used by the RM (the 5) per
> cluster (with config changes) a head of time.  That is what the RM would
> know as its ports.  At this point, when marathon spins up a RM, HA proxy
> would take the service ports (which would be the same ports the RM "thinks"
> is running on) and forward them to the ports that mesos has proxied (in the
> available ports list). I've done this in Docker, but not on native marathon
> run processes. I need to look into that more.
>
> One concern I have with the HAProxy is long running TCP connections (I am
> not sure if this applies to Yarn/RM)  Basically on one particular use case:
> Running a Hive Thrift (hiveserver2) service in docker on the mesos cluster
> with HAProxy. I found if I submitted a query that was long, that the query
> would be submitted, and HAProxy would not seen connections for a while and
> kill the proxy to the backend. This was annoying to say the least.   Would
> this occur with HAProxy? I really think that if the haproxy-marathon bridge
> would be used we'd have to be certain that condition wouldn't occur, even
> hidden. (I would hate for something to happen where that condition occurs,
> however, Yarn is able to "reset" without error, adding a bit of latency to
> the process, and have that go unaddressed).
>
> So other than the HAProxy weirdness I saw, that approach could work, and
> then mesos-dns is just a nice component for administrators and users. What
> do I mean by that?
>
> Well, let's say you have a cluster of node1, node2, node3, and node4.
>
> You assign the 5 yarn ports (and service ports) for that cluster to be
> 15000, 15001, 15002, 15003, 15004.
>
> Myriad starts a node manager. It sets in the RM config (and all NM
>  configs) the ports based on the 5 above
>
> Mesos grabs 5 random ports in it's allowed range (default 30000 to 31000)
>
> When Mesos starts the RM process, lets say it starts it on node2.
>
> Node2 now has ports 30000,30001,30002,30003,and 30004 listening and is
> forwarding those to 15000,15001,15002,15003, and 15004 on the listening
> process.  (Note, I know this is doable with Docker contained processes, can
> Marathon do it outside of docker?)
>
> Now haproxy's config is updated. on EVERY node, the ports 15000-15004 are
> listening and are forwarding to Node2 on ports 30000-30004.
>
> To your point on "needing" mesos-dns. Technically no, we don't need it. we
> can tell our NMs to connect to any node on ports 15000-15004. This will
> work. But it's we may get added latency (rack to rack forwarding etc extra
> hops).
>
> Instead, if we set the NMs to connect to myriad-dev-1.marathon.mesos  It
> could return an IP that is THE node it's running on.  That way we get the
> advantage of having the NMs connect to the box with the process.  HA proxy
> takes the requests, and sends to the mesos ports (30000-30004) which Mesos
> then sends to the process on ports 15000-15004.
>
> So without mesos-dns: you just connect to any node on the service ports and
> it "works" but when it comes to self documentation, connecting to
> myriad-dev-1.marathon.mesos seems more descriptive than saying the NM is on
> node2.yourdomain.  Especially when it's not... potential for administrative
> confusion.
>
> With mesos-dns, you connect to the descriptive name, and it works. But then
> given my concerns with HAProxy, do we even NEED it? All HAProxy is doing at
> that point is opening a port on a node, sending to another mesos approved
> port only to send it to the same port the process is listening on. Are we
> adding complexity?
>
> This is a great discussion as it speaks to some intrinsic challenges that
> exist in data center OSes :)
>
>
> .
>
>
> On Thu, May 14, 2015 at 1:50 PM, Santosh Marella <[email protected]>
> wrote:
>
> > I might be missing something, but I didn't understand why mesos-dns would
> > be required in addition to HAProxy. If we configure RM to bind to random
> > ports, but have RM reachable via HAProxy on RM's service ports, won't all
> > the clients (such as NMs/HiveServer2 etc) just use HAProxy to reach to
> RM?
> > If yes, why is mesos-dns needed?
> >
> > I have very limited knowledge about HAProxy configuration in a mesos
> > cluster. I just read through this doc:
> > https://docs.mesosphere.com/getting-started/service-discovery/ and what
> I
> > inferred is that a HAProxy instance runs on every slave node and if NM
> > running on a slave node has to reach to RM, it would simply use a RM's
> > address that looks like "localhost:99999" (where 99999 is a admin
> > identified RPC service port for RM).
> > Since HAProxy on NM's localhost listens on 99999, it just forwards the
> > traffic to RM's IP:RandomPort. Am I understanding this correctly?
> >
> > Thanks,
> > Santosh
> >
> > On Tue, May 12, 2015 at 5:41 AM, John Omernik <[email protected]> wrote:
> >
> > > The challenge I think is the ports. So we have 5 ports that are needed
> > for
> > > a RM, do we predefine those? I think Yuliya is saying yes, we should.
> An
> > > interesting compromise... rather than truly random ports,  when we
> > define a
> > > Yarn cluster, we have the responsibility to define out 5 "service"
> ports
> > > using the Martahon/HA Proxy Service ports. (This now requires HA Proxy
> as
> > > well as mesos-dns.
> >
> > I'd recommend some work being done on documenting
> > > HAProxy for use with the haproxy script, I know that I stumbled a bit
> > > trying to get HAProxy setup, but that just may be my own lack of
> > knowledge
> > > on the subject) These ports will have to be available across the
> cluster,
> > > and will map to whichever ports Mesos Assigns to the RM.
> > >
> > > This makes sense to me, a "Yarn Cluster Creation" event on a Mesos
> > cluster
> > > is something we want to be flexible, but it's not something that will
> > > likely be "self service". I.e. we won't have users just creating Yarn
> > > clusters at will. It will likely be something that, when requested, the
> > > Admin can identify 5 available service ports, and lock those into that
> > > cluster... that way when the Yarn RM spins up, it has it's service
> ports
> > > defined (and thus the Node managers always know which ports to connect
> > to).
> > > Combined with Mesos DNS, this could actually work out very well, as you
> > can
> > > the name of the RM can be hard coded, and the ports will just work no
> > > matter which node it spins up.
> > >
> > > From an HA perspective, The only advantage at this point that
> > preallocating
> > > the failover RM is speed of recovery.  (and guarantee of resources
> being
> > > available if failover occurs).  Perhaps we could consider this as an
> > option
> > > for those who need fast or guaranteed recovery but not make it a
> > > requirement?
> > >
> > > The service port method will not work however for the node manager
> ports.
> > > That said, I "believe" that as myriad spins up a node manager, it can
> > > dynamically allocate the ports, and thus report those to the resource
> > > manager on registration. Someone may need to help me out on that one,
> as
> > I
> > > am not sure.  Also, since the node manager is host specific, mesos-dns
> is
> > > not required, it can register to the resource manager with what ever
> > ports
> > > are allocated, and the hostname it's running on.  I guess the question
> > here
> > > is, when Myriad requests the resources, and mesos allocates the ports,
> > can
> > > myriad, prior to actually starting the node manager, update the configs
> > > with the allocated ports?   Or is this even needed?
> > >
> > > This is a great discussion.
> > >
> > > On Mon, May 11, 2015 at 9:58 PM, yuliya Feldman
> > > <[email protected]
> > > > wrote:
> > >
> > > > As far as I understand in this case Apache YARN RM HA will kick in -
> > > which
> > > > means all the ids, hosts, ports for all RMs will need to be defined
> > > > somewhere and I wonder how it will be defined in this situation since
> > > those
> > > > either need to be in yarn-site.xml or using "-D".
> > > > In case of Mesos-DNS usage no need to setup RM HA at all and no warm
> > > > standby needed. Marathon will start RM somewhere in case of failure
> and
> > > > clients will rediscover it based on the same hostname.
> > > > Am I missing anything?
> > > >       From: Adam Bordelon <[email protected]>
> > > >  To: [email protected]
> > > >  Sent: Monday, May 11, 2015 7:26 PM
> > > >  Subject: Re: Recommending or requiring mesos dns?
> > > >
> > > > I'm a +1 for random ports. You can also use Marathon's servicePort
> > field
> > > to
> > > > let HAProxy redirect from the servicePort to the actual hostPort for
> > the
> > > > service on each node. Mesos-DNS will similarly direct you to the
> > correct
> > > > host:port given the appropriate task name.
> > > >
> > > > Is there a reason we can't just have Marathon launch two RM tasks for
> > the
> > > > same YARN cluster? One would be the leader, and the other would
> > redirect
> > > to
> > > > it until failover. Once one fails over, the other will start taking
> > > > traffic, and Marathon will try to launch a new backup RM when the
> > > resources
> > > > are available. If the YARN RM cannot provide us this functionality on
> > its
> > > > own, perhaps we can write a simple wrapper script for it.
> > > >
> > > >
> > > >
> > > > On Fri, May 8, 2015 at 11:57 AM, John Omernik <[email protected]>
> > wrote:
> > > >
> > > > > I would advocate random ports  because there should not be a
> > limitation
> > > > of
> > > > > running only one RM per node.  If we want true portability, there
> > > should
> > > > be
> > > > > the ability to have RM for the cluster YarnProd to run to run on
> > node1
> > > > and
> > > > > also have RM for the cluster YarnDev running on Node1. (if it so
> > > happens
> > > > to
> > > > > land this way).  That way the number of clusters isn't limited by
> the
> > > > > number of physical nodes.
> > > > >
> > > > > On Fri, May 8, 2015 at 1:33 PM, Santosh Marella <
> > [email protected]
> > > >
> > > > > wrote:
> > > > >
> > > > > > RM can store its data either in HDFS or in ZooKeeper. The data
> > store
> > > is
> > > > > > configurable. There is a config property in YARN
> > > > > > (yarn.resourcemanager.recovery.enabled) that tells RM whether it
> > > should
> > > > > try
> > > > > > to recover the metadata about the previously submitted apps, the
> > > > > containers
> > > > > > allocated to them etc from the state store.
> > > > > >
> > > > > > Pre allocation of a backup rm is a great idea. Thinking about it
> a
> > > bit
> > > > > > more, I felt it might be better to have such an option available
> in
> > > > > > Marathon rather than building it in Myriad (and in all
> > > > > frameworks/services
> > > > > > that wants HA/failover).
> > > > > >
> > > > > >  Let's say we launch a service X via marathon that requires some
> > > > > resources
> > > > > > (cpus/mem/ports) and we want 1 instance of that service to be
> > always
> > > > > > available. Marathon promises restart of the service if it goes
> > down.
> > > > But,
> > > > > > as far as I understand, marathon can restart the service on
> another
> > > > node
> > > > > > only if the resources required by service X are available on that
> > > node
> > > > > > *after* the service goes down. In other words, Marathon doesn't
> > > > > proactively
> > > > > > "reserve" these resources on another node as a backup for
> failover.
> > > > > >
> > > > > > Again, not all services launched via Marathon requires this, but
> > > > perhaps
> > > > > > there should be an config option to specify if a service desires
> to
> > > > have
> > > > > > marathon keep a backup node ready-to-go in the event of failure.
> > > > > >
> > > > > >
> > > > > > On Thu, May 7, 2015 at 4:12 PM, John Omernik <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > So I may be lookng at this wrong, but where is the data for the
> > rm
> > > > > stored
> > > > > > > if it does fail over? How will it know to pick up where it left
> > > off?
> > > > > This
> > > > > >
> > > > > > is just one area I am low in understanding on.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > >  That said, what about pre allocating a second failover rm some
> > > where
> > > > > on
> > > > > > > the cluster.  (I am just tossing an idea here, in that there
> are
> > > > > probably
> > > > > > > many reasons not to do this) but here is how I could see it
> > > > happening.
> > > > > > >
> > > > > > 1. Myriad starts a rm asking for 5 random available ports.  Mesos
> > > > replies
> > > > > > > starting the rm and reports to myriad the 5 ports used for the
> > > > services
> > > > > > you
> > > > > > > listed below.
> > > > > > >
> > > > > > > 2. Myriad then checks a config value of number of "hot spares"
> > lets
> > > > say
> > > > > > we
> > > > > > > specify 1. Myriad then puts in a resource request to mesos for
> > CPU
> > > > and
> > > > > > > memory required for the rm, but specifically asks for the same
> 5
> > > > ports
> > > > > > > allocated to the first. Basically it reserves a spot on another
> > > node
> > > > > with
> > > > > > > the same ports available. It may tak a bit, but there should be
> > > that
> > > > > > > availability. Until this request is met, the yarn cluster is
> in a
> > > ha
> > > > > > > compromised position.
> > > > > > >
> > > > > >
> > > > > >    This is exactly what I think we should do, but why use random
> > > ports
> > > > > > instead of standard RM ports? If you have 10 slave nodes in your
> > > mesos
> > > > > > cluster, then there are 10 potential spots for RM to be launched
> > on.
> > > > > > However, if you choose to launch multiple RMs (multiple YARN
> > > clusters),
> > > > > > then you can probably launch utmost 5 (with remaining 5 nodes
> > > available
> > > > > >
> > > > > > >
> > > > > > > 3. At this point the perhaps we start another instance of rm
> > right
> > > > away
> > > > > > > (depends on my first question on where the rm stores into about
> > > > > > > jobs/applications) or the frame work just holds the spot,
> waiting
> > > > for a
> > > > > > > lack of heart beat (failover condition) on the primay resource
> > > > manager.
> > > > > > >
> > > > > > > 4. If we can run the spare with no issues, it's a simple update
> > of
> > > > the
> > > > > > dns
> > > > > > > record and node managers connect to the new rm ( and another rm
> > is
> > > > > > > preallocated for redundancy). If we can't actually execute the
> > > > > secondary
> > > > > > rm
> > > > > > > until failover conditions, we can now execute the new rm, and
> the
> > > > ports
> > > > > > > will be the same.
> > > > > > >
> > > > > > > This may seem kludgey at first, but done correctly, it may
> > actually
> > > > > limit
> > > > > > > the length of failover time as the rm is preallocated.  Rms are
> > not
> > > > > huge
> > > > > > > from a resource perspective thus it may be a small cost for
> those
> > > who
> > > > > > want
> > > > > > > failover and multiple clusters (thus having dynamic ports)
> > > > > > >
> > > > > > > I will keep thinking this through, and would welcome feedback.
> > > > > > >
> > > > > > > On Thursday, May 7, 2015, Santosh Marella <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi John,
> > > > > > > >
> > > > > > > >  Great views about extending mesos dns for rm's discovery.
> Some
> > > > > > > thoughts:
> > > > > > > >    1. There are 5 primary interfaces RM exposes that are
> bound
> > to
> > > > > > > standard
> > > > > > > > ports.
> > > > > > > >        a. RPC interface for clients that want to submit
> > > > applications
> > > > > > to
> > > > > > > > YARN (port 8032).
> > > > > > > >        b. RPC interface for NMs to connect back/HB to RM
> (port
> > > > > 8031).
> > > > > > > >        c. RPC interface for App Masters to connect back/HB to
> > RM
> > > > > (port
> > > > > > > > 8030).
> > > > > > > >        d. RPC interface for admin to interact with RM via CLI
> > > (port
> > > > > > > 8033).
> > > > > > > >        e. Web Interface for RM's UI (port 8088).
> > > > > > > >    2. When we launch RM using Marathon, it's probably better
> to
> > > > > mention
> > > > > > > in
> > > > > > > > marathon's config that RM will use the above ports. This is
> > > > because,
> > > > > if
> > > > > > > RM
> > > > > > > > doesn't listens on random ports (as opposed to the above
> listed
> > > > > > standard
> > > > > > > > ports), when RM fails over, the new RM gets ports that might
> be
> > > > > > different
> > > > > > > > from the ones used by the old RM. This makes the RM's
> discovery
> > > > hard,
> > > > > > > > especially post failover.
> > > > > > > >    3. It looks like what you are proposing is a way to update
> > > > > mesos-dns
> > > > > > > as
> > > > > > > > to what ports RM's services are listening on. And when RM
> fails
> > > > over,
> > > > > > > these
> > > > > > > > ports would get updated in mesos-dns. Is my understanding
> > > correct?
> > > > If
> > > > > > > yes,
> > > > > > > > one challenge I see is that the clients that want to connect
> to
> > > the
> > > > > > above
> > > > > > > > listed RM interfaces also need to pull the changes to RM's
> port
> > > > > numbers
> > > > > > > > from mesos-dns dynamically. Not sure how that might be
> > possible.
> > > > > > > >
> > > > > > > >  Regarding your question about NM ports
> > > > > > > >  1. NM has the following ports:
> > > > > > > >      a. RPC port for app masters to launch containers (this
> is
> > a
> > > > > > random
> > > > > > > > port).
> > > > > > > >      b. RPC port for localization service. (port 8040)
> > > > > > > >      c. Web port for NM's UI (port 8042).
> > > > > > > >    2. Ports (a) and (c) are relayed to RM when NM registers
> > with
> > > > RM.
> > > > > > Port
> > > > > > > > (b) is passed to a local container executor process via
> command
> > > > line
> > > > > > > args.
> > > > > > > >    3. As you rightly reckon, we need a mechanism at launch of
> > NM
> > > to
> > > > > > pass
> > > > > > > > the mesos allocated ports to NM for the above interfaces. We
> > can
> > > > try
> > > > > > > > to use variable
> > > > > > > > expansion
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/conf/Configuration.html
> > > > > > > > >
> > > > > > > > mechanism hadoop has to achieve this.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Santosh
> > > > > > > >
> > > > > > > > On Thu, May 7, 2015 at 3:51 AM, John Omernik <
> [email protected]
> > > > > > > > <javascript:;>> wrote:
> > > > > > > >
> > > > > > > > > I've implemented mesos-dns and use marathon to launch my
> > myriad
> > > > > > > > framework.
> > > > > > > > > It shows up as myriad.marahon.mesos and makes it easy to
> find
> > > > what
> > > > > > node
> > > > > > > > the
> > > > > > > > > framework launched the resource manager on.
> > > > > > > > >
> > > > > > > > >  What if we made myriad mesos-dns aware, and prior to
> > launching
> > > > the
> > > > > > > yarn
> > > > > > > > > rm, it could register in mesos dns. This would mean both
> the
> > ip
> > > > > > > addresses
> > > > > > > > > and the ports (we need to figure out multiple ports in
> > > > mesos-dns).
> > > > > > Then
> > > > > > > > it
> > > > > > > > > could write out ports and host names in the nm configs by
> > > > checking
> > > > > > > mesos
> > > > > > > > > dns for which ports the resource manager is using.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Side question:  when a node manager registers with the
> > resource
> > > > > > manager
> > > > > > > > > are the ports the nm is running on completely up to the nm?
> > Ie
> > > I
> > > > > can
> > > > > > > run
> > > > > > > > my
> > > > > > > > > nm web server any port, Yarn just explains that to the rm
> on
> > > > > > > > registration?
> > > > > > > > > Because then we need a mechanism at launch of the nm task
> to
> > > > > > understand
> > > > > > > > > which ports mesos has allocated to the nm and update the
> > > > yarn-site
> > > > > > for
> > > > > > > > that
> > > > > > > > > nm before launch.... Perhaps mesos-dns as a requirement
> isn't
> > > > > needed,
> > > > > > > > but I
> > > > > > > > > am trying to walk through options that get us closer to
> > > multiple
> > > > > yarn
> > > > > > > > > clusters on a mesos cluster.
> > > > > > > > >
> > > > > > > > > John
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Sent from my iThing
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sent from my iThing
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: Recommending or requiring mesos dns?

Reply via email to