Re: Recommending or requiring mesos dns?

Swapnil Daingade Wed, 20 May 2015 18:50:18 -0700

Trying to send image again. This time as attachment.

Regards
Swapnil



On Wed, May 20, 2015 at 5:43 PM, Swapnil Daingade <
[email protected]> wrote:

> Hi John,
>
> Are you suggesting something like this ?
>
> In issue 96 we are proposing something that will not require port mapping.
> Can you take a look and give your thoughts
> https://github.com/mesos/myriad/issues/96
>
> Regards
> Swapnil
>
> 
>
> On Fri, May 15, 2015 at 6:44 AM, John Omernik <[email protected]> wrote:
>
>> This is true. In this setup thought, we wouldn't be using the "random
>> ports" We'd be assigning the ports that will be used by the RM (the 5) per
>> cluster (with config changes) a head of time.  That is what the RM would
>> know as its ports.  At this point, when marathon spins up a RM, HA proxy
>> would take the service ports (which would be the same ports the RM
>> "thinks"
>> is running on) and forward them to the ports that mesos has proxied (in
>> the
>> available ports list). I've done this in Docker, but not on native
>> marathon
>> run processes. I need to look into that more.
>>
>> One concern I have with the HAProxy is long running TCP connections (I am
>> not sure if this applies to Yarn/RM)  Basically on one particular use
>> case:
>> Running a Hive Thrift (hiveserver2) service in docker on the mesos cluster
>> with HAProxy. I found if I submitted a query that was long, that the query
>> would be submitted, and HAProxy would not seen connections for a while and
>> kill the proxy to the backend. This was annoying to say the least.   Would
>> this occur with HAProxy? I really think that if the haproxy-marathon
>> bridge
>> would be used we'd have to be certain that condition wouldn't occur, even
>> hidden. (I would hate for something to happen where that condition occurs,
>> however, Yarn is able to "reset" without error, adding a bit of latency to
>> the process, and have that go unaddressed).
>>
>> So other than the HAProxy weirdness I saw, that approach could work, and
>> then mesos-dns is just a nice component for administrators and users. What
>> do I mean by that?
>>
>> Well, let's say you have a cluster of node1, node2, node3, and node4.
>>
>> You assign the 5 yarn ports (and service ports) for that cluster to be
>> 15000, 15001, 15002, 15003, 15004.
>>
>> Myriad starts a node manager. It sets in the RM config (and all NM
>>  configs) the ports based on the 5 above
>>
>> Mesos grabs 5 random ports in it's allowed range (default 30000 to 31000)
>>
>> When Mesos starts the RM process, lets say it starts it on node2.
>>
>> Node2 now has ports 30000,30001,30002,30003,and 30004 listening and is
>> forwarding those to 15000,15001,15002,15003, and 15004 on the listening
>> process.  (Note, I know this is doable with Docker contained processes,
>> can
>> Marathon do it outside of docker?)
>>
>> Now haproxy's config is updated. on EVERY node, the ports 15000-15004 are
>> listening and are forwarding to Node2 on ports 30000-30004.
>>
>> To your point on "needing" mesos-dns. Technically no, we don't need it. we
>> can tell our NMs to connect to any node on ports 15000-15004. This will
>> work. But it's we may get added latency (rack to rack forwarding etc extra
>> hops).
>>
>> Instead, if we set the NMs to connect to myriad-dev-1.marathon.mesos  It
>> could return an IP that is THE node it's running on.  That way we get the
>> advantage of having the NMs connect to the box with the process.  HA proxy
>> takes the requests, and sends to the mesos ports (30000-30004) which Mesos
>> then sends to the process on ports 15000-15004.
>>
>> So without mesos-dns: you just connect to any node on the service ports
>> and
>> it "works" but when it comes to self documentation, connecting to
>> myriad-dev-1.marathon.mesos seems more descriptive than saying the NM is
>> on
>> node2.yourdomain.  Especially when it's not... potential for
>> administrative
>> confusion.
>>
>> With mesos-dns, you connect to the descriptive name, and it works. But
>> then
>> given my concerns with HAProxy, do we even NEED it? All HAProxy is doing
>> at
>> that point is opening a port on a node, sending to another mesos approved
>> port only to send it to the same port the process is listening on. Are we
>> adding complexity?
>>
>> This is a great discussion as it speaks to some intrinsic challenges that
>> exist in data center OSes :)
>>
>>
>> .
>>
>>
>> On Thu, May 14, 2015 at 1:50 PM, Santosh Marella <[email protected]>
>> wrote:
>>
>> > I might be missing something, but I didn't understand why mesos-dns
>> would
>> > be required in addition to HAProxy. If we configure RM to bind to random
>> > ports, but have RM reachable via HAProxy on RM's service ports, won't
>> all
>> > the clients (such as NMs/HiveServer2 etc) just use HAProxy to reach to
>> RM?
>> > If yes, why is mesos-dns needed?
>> >
>> > I have very limited knowledge about HAProxy configuration in a mesos
>> > cluster. I just read through this doc:
>> > https://docs.mesosphere.com/getting-started/service-discovery/ and
>> what I
>> > inferred is that a HAProxy instance runs on every slave node and if NM
>> > running on a slave node has to reach to RM, it would simply use a RM's
>> > address that looks like "localhost:99999" (where 99999 is a admin
>> > identified RPC service port for RM).
>> > Since HAProxy on NM's localhost listens on 99999, it just forwards the
>> > traffic to RM's IP:RandomPort. Am I understanding this correctly?
>> >
>> > Thanks,
>> > Santosh
>> >
>> > On Tue, May 12, 2015 at 5:41 AM, John Omernik <[email protected]> wrote:
>> >
>> > > The challenge I think is the ports. So we have 5 ports that are needed
>> > for
>> > > a RM, do we predefine those? I think Yuliya is saying yes, we
>> should.  An
>> > > interesting compromise... rather than truly random ports,  when we
>> > define a
>> > > Yarn cluster, we have the responsibility to define out 5 "service"
>> ports
>> > > using the Martahon/HA Proxy Service ports. (This now requires HA
>> Proxy as
>> > > well as mesos-dns.
>> >
>> > I'd recommend some work being done on documenting
>> > > HAProxy for use with the haproxy script, I know that I stumbled a bit
>> > > trying to get HAProxy setup, but that just may be my own lack of
>> > knowledge
>> > > on the subject) These ports will have to be available across the
>> cluster,
>> > > and will map to whichever ports Mesos Assigns to the RM.
>> > >
>> > > This makes sense to me, a "Yarn Cluster Creation" event on a Mesos
>> > cluster
>> > > is something we want to be flexible, but it's not something that will
>> > > likely be "self service". I.e. we won't have users just creating Yarn
>> > > clusters at will. It will likely be something that, when requested,
>> the
>> > > Admin can identify 5 available service ports, and lock those into that
>> > > cluster... that way when the Yarn RM spins up, it has it's service
>> ports
>> > > defined (and thus the Node managers always know which ports to connect
>> > to).
>> > > Combined with Mesos DNS, this could actually work out very well, as
>> you
>> > can
>> > > the name of the RM can be hard coded, and the ports will just work no
>> > > matter which node it spins up.
>> > >
>> > > From an HA perspective, The only advantage at this point that
>> > preallocating
>> > > the failover RM is speed of recovery.  (and guarantee of resources
>> being
>> > > available if failover occurs).  Perhaps we could consider this as an
>> > option
>> > > for those who need fast or guaranteed recovery but not make it a
>> > > requirement?
>> > >
>> > > The service port method will not work however for the node manager
>> ports.
>> > > That said, I "believe" that as myriad spins up a node manager, it can
>> > > dynamically allocate the ports, and thus report those to the resource
>> > > manager on registration. Someone may need to help me out on that one,
>> as
>> > I
>> > > am not sure.  Also, since the node manager is host specific,
>> mesos-dns is
>> > > not required, it can register to the resource manager with what ever
>> > ports
>> > > are allocated, and the hostname it's running on.  I guess the question
>> > here
>> > > is, when Myriad requests the resources, and mesos allocates the ports,
>> > can
>> > > myriad, prior to actually starting the node manager, update the
>> configs
>> > > with the allocated ports?   Or is this even needed?
>> > >
>> > > This is a great discussion.
>> > >
>> > > On Mon, May 11, 2015 at 9:58 PM, yuliya Feldman
>> > > <[email protected]
>> > > > wrote:
>> > >
>> > > > As far as I understand in this case Apache YARN RM HA will kick in -
>> > > which
>> > > > means all the ids, hosts, ports for all RMs will need to be defined
>> > > > somewhere and I wonder how it will be defined in this situation
>> since
>> > > those
>> > > > either need to be in yarn-site.xml or using "-D".
>> > > > In case of Mesos-DNS usage no need to setup RM HA at all and no warm
>> > > > standby needed. Marathon will start RM somewhere in case of failure
>> and
>> > > > clients will rediscover it based on the same hostname.
>> > > > Am I missing anything?
>> > > >       From: Adam Bordelon <[email protected]>
>> > > >  To: [email protected]
>> > > >  Sent: Monday, May 11, 2015 7:26 PM
>> > > >  Subject: Re: Recommending or requiring mesos dns?
>> > > >
>> > > > I'm a +1 for random ports. You can also use Marathon's servicePort
>> > field
>> > > to
>> > > > let HAProxy redirect from the servicePort to the actual hostPort for
>> > the
>> > > > service on each node. Mesos-DNS will similarly direct you to the
>> > correct
>> > > > host:port given the appropriate task name.
>> > > >
>> > > > Is there a reason we can't just have Marathon launch two RM tasks
>> for
>> > the
>> > > > same YARN cluster? One would be the leader, and the other would
>> > redirect
>> > > to
>> > > > it until failover. Once one fails over, the other will start taking
>> > > > traffic, and Marathon will try to launch a new backup RM when the
>> > > resources
>> > > > are available. If the YARN RM cannot provide us this functionality
>> on
>> > its
>> > > > own, perhaps we can write a simple wrapper script for it.
>> > > >
>> > > >
>> > > >
>> > > > On Fri, May 8, 2015 at 11:57 AM, John Omernik <[email protected]>
>> > wrote:
>> > > >
>> > > > > I would advocate random ports  because there should not be a
>> > limitation
>> > > > of
>> > > > > running only one RM per node.  If we want true portability, there
>> > > should
>> > > > be
>> > > > > the ability to have RM for the cluster YarnProd to run to run on
>> > node1
>> > > > and
>> > > > > also have RM for the cluster YarnDev running on Node1. (if it so
>> > > happens
>> > > > to
>> > > > > land this way).  That way the number of clusters isn't limited by
>> the
>> > > > > number of physical nodes.
>> > > > >
>> > > > > On Fri, May 8, 2015 at 1:33 PM, Santosh Marella <
>> > [email protected]
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > RM can store its data either in HDFS or in ZooKeeper. The data
>> > store
>> > > is
>> > > > > > configurable. There is a config property in YARN
>> > > > > > (yarn.resourcemanager.recovery.enabled) that tells RM whether it
>> > > should
>> > > > > try
>> > > > > > to recover the metadata about the previously submitted apps, the
>> > > > > containers
>> > > > > > allocated to them etc from the state store.
>> > > > > >
>> > > > > > Pre allocation of a backup rm is a great idea. Thinking about
>> it a
>> > > bit
>> > > > > > more, I felt it might be better to have such an option
>> available in
>> > > > > > Marathon rather than building it in Myriad (and in all
>> > > > > frameworks/services
>> > > > > > that wants HA/failover).
>> > > > > >
>> > > > > >  Let's say we launch a service X via marathon that requires some
>> > > > > resources
>> > > > > > (cpus/mem/ports) and we want 1 instance of that service to be
>> > always
>> > > > > > available. Marathon promises restart of the service if it goes
>> > down.
>> > > > But,
>> > > > > > as far as I understand, marathon can restart the service on
>> another
>> > > > node
>> > > > > > only if the resources required by service X are available on
>> that
>> > > node
>> > > > > > *after* the service goes down. In other words, Marathon doesn't
>> > > > > proactively
>> > > > > > "reserve" these resources on another node as a backup for
>> failover.
>> > > > > >
>> > > > > > Again, not all services launched via Marathon requires this, but
>> > > > perhaps
>> > > > > > there should be an config option to specify if a service
>> desires to
>> > > > have
>> > > > > > marathon keep a backup node ready-to-go in the event of failure.
>> > > > > >
>> > > > > >
>> > > > > > On Thu, May 7, 2015 at 4:12 PM, John Omernik <[email protected]>
>> > > wrote:
>> > > > > >
>> > > > > > > So I may be lookng at this wrong, but where is the data for
>> the
>> > rm
>> > > > > stored
>> > > > > > > if it does fail over? How will it know to pick up where it
>> left
>> > > off?
>> > > > > This
>> > > > > >
>> > > > > > is just one area I am low in understanding on.
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > > >  That said, what about pre allocating a second failover rm
>> some
>> > > where
>> > > > > on
>> > > > > > > the cluster.  (I am just tossing an idea here, in that there
>> are
>> > > > > probably
>> > > > > > > many reasons not to do this) but here is how I could see it
>> > > > happening.
>> > > > > > >
>> > > > > > 1. Myriad starts a rm asking for 5 random available ports.
>> Mesos
>> > > > replies
>> > > > > > > starting the rm and reports to myriad the 5 ports used for the
>> > > > services
>> > > > > > you
>> > > > > > > listed below.
>> > > > > > >
>> > > > > > > 2. Myriad then checks a config value of number of "hot spares"
>> > lets
>> > > > say
>> > > > > > we
>> > > > > > > specify 1. Myriad then puts in a resource request to mesos for
>> > CPU
>> > > > and
>> > > > > > > memory required for the rm, but specifically asks for the
>> same 5
>> > > > ports
>> > > > > > > allocated to the first. Basically it reserves a spot on
>> another
>> > > node
>> > > > > with
>> > > > > > > the same ports available. It may tak a bit, but there should
>> be
>> > > that
>> > > > > > > availability. Until this request is met, the yarn cluster is
>> in a
>> > > ha
>> > > > > > > compromised position.
>> > > > > > >
>> > > > > >
>> > > > > >    This is exactly what I think we should do, but why use random
>> > > ports
>> > > > > > instead of standard RM ports? If you have 10 slave nodes in your
>> > > mesos
>> > > > > > cluster, then there are 10 potential spots for RM to be launched
>> > on.
>> > > > > > However, if you choose to launch multiple RMs (multiple YARN
>> > > clusters),
>> > > > > > then you can probably launch utmost 5 (with remaining 5 nodes
>> > > available
>> > > > > >
>> > > > > > >
>> > > > > > > 3. At this point the perhaps we start another instance of rm
>> > right
>> > > > away
>> > > > > > > (depends on my first question on where the rm stores into
>> about
>> > > > > > > jobs/applications) or the frame work just holds the spot,
>> waiting
>> > > > for a
>> > > > > > > lack of heart beat (failover condition) on the primay resource
>> > > > manager.
>> > > > > > >
>> > > > > > > 4. If we can run the spare with no issues, it's a simple
>> update
>> > of
>> > > > the
>> > > > > > dns
>> > > > > > > record and node managers connect to the new rm ( and another
>> rm
>> > is
>> > > > > > > preallocated for redundancy). If we can't actually execute the
>> > > > > secondary
>> > > > > > rm
>> > > > > > > until failover conditions, we can now execute the new rm, and
>> the
>> > > > ports
>> > > > > > > will be the same.
>> > > > > > >
>> > > > > > > This may seem kludgey at first, but done correctly, it may
>> > actually
>> > > > > limit
>> > > > > > > the length of failover time as the rm is preallocated.  Rms
>> are
>> > not
>> > > > > huge
>> > > > > > > from a resource perspective thus it may be a small cost for
>> those
>> > > who
>> > > > > > want
>> > > > > > > failover and multiple clusters (thus having dynamic ports)
>> > > > > > >
>> > > > > > > I will keep thinking this through, and would welcome feedback.
>> > > > > > >
>> > > > > > > On Thursday, May 7, 2015, Santosh Marella <
>> [email protected]
>> > >
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi John,
>> > > > > > > >
>> > > > > > > >  Great views about extending mesos dns for rm's discovery.
>> Some
>> > > > > > > thoughts:
>> > > > > > > >    1. There are 5 primary interfaces RM exposes that are
>> bound
>> > to
>> > > > > > > standard
>> > > > > > > > ports.
>> > > > > > > >        a. RPC interface for clients that want to submit
>> > > > applications
>> > > > > > to
>> > > > > > > > YARN (port 8032).
>> > > > > > > >        b. RPC interface for NMs to connect back/HB to RM
>> (port
>> > > > > 8031).
>> > > > > > > >        c. RPC interface for App Masters to connect back/HB
>> to
>> > RM
>> > > > > (port
>> > > > > > > > 8030).
>> > > > > > > >        d. RPC interface for admin to interact with RM via
>> CLI
>> > > (port
>> > > > > > > 8033).
>> > > > > > > >        e. Web Interface for RM's UI (port 8088).
>> > > > > > > >    2. When we launch RM using Marathon, it's probably
>> better to
>> > > > > mention
>> > > > > > > in
>> > > > > > > > marathon's config that RM will use the above ports. This is
>> > > > because,
>> > > > > if
>> > > > > > > RM
>> > > > > > > > doesn't listens on random ports (as opposed to the above
>> listed
>> > > > > > standard
>> > > > > > > > ports), when RM fails over, the new RM gets ports that
>> might be
>> > > > > > different
>> > > > > > > > from the ones used by the old RM. This makes the RM's
>> discovery
>> > > > hard,
>> > > > > > > > especially post failover.
>> > > > > > > >    3. It looks like what you are proposing is a way to
>> update
>> > > > > mesos-dns
>> > > > > > > as
>> > > > > > > > to what ports RM's services are listening on. And when RM
>> fails
>> > > > over,
>> > > > > > > these
>> > > > > > > > ports would get updated in mesos-dns. Is my understanding
>> > > correct?
>> > > > If
>> > > > > > > yes,
>> > > > > > > > one challenge I see is that the clients that want to
>> connect to
>> > > the
>> > > > > > above
>> > > > > > > > listed RM interfaces also need to pull the changes to RM's
>> port
>> > > > > numbers
>> > > > > > > > from mesos-dns dynamically. Not sure how that might be
>> > possible.
>> > > > > > > >
>> > > > > > > >  Regarding your question about NM ports
>> > > > > > > >  1. NM has the following ports:
>> > > > > > > >      a. RPC port for app masters to launch containers (this
>> is
>> > a
>> > > > > > random
>> > > > > > > > port).
>> > > > > > > >      b. RPC port for localization service. (port 8040)
>> > > > > > > >      c. Web port for NM's UI (port 8042).
>> > > > > > > >    2. Ports (a) and (c) are relayed to RM when NM registers
>> > with
>> > > > RM.
>> > > > > > Port
>> > > > > > > > (b) is passed to a local container executor process via
>> command
>> > > > line
>> > > > > > > args.
>> > > > > > > >    3. As you rightly reckon, we need a mechanism at launch
>> of
>> > NM
>> > > to
>> > > > > > pass
>> > > > > > > > the mesos allocated ports to NM for the above interfaces. We
>> > can
>> > > > try
>> > > > > > > > to use variable
>> > > > > > > > expansion
>> > > > > > > > <
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/conf/Configuration.html
>> > > > > > > > >
>> > > > > > > > mechanism hadoop has to achieve this.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Santosh
>> > > > > > > >
>> > > > > > > > On Thu, May 7, 2015 at 3:51 AM, John Omernik <
>> [email protected]
>> > > > > > > > <javascript:;>> wrote:
>> > > > > > > >
>> > > > > > > > > I've implemented mesos-dns and use marathon to launch my
>> > myriad
>> > > > > > > > framework.
>> > > > > > > > > It shows up as myriad.marahon.mesos and makes it easy to
>> find
>> > > > what
>> > > > > > node
>> > > > > > > > the
>> > > > > > > > > framework launched the resource manager on.
>> > > > > > > > >
>> > > > > > > > >  What if we made myriad mesos-dns aware, and prior to
>> > launching
>> > > > the
>> > > > > > > yarn
>> > > > > > > > > rm, it could register in mesos dns. This would mean both
>> the
>> > ip
>> > > > > > > addresses
>> > > > > > > > > and the ports (we need to figure out multiple ports in
>> > > > mesos-dns).
>> > > > > > Then
>> > > > > > > > it
>> > > > > > > > > could write out ports and host names in the nm configs by
>> > > > checking
>> > > > > > > mesos
>> > > > > > > > > dns for which ports the resource manager is using.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > Side question:  when a node manager registers with the
>> > resource
>> > > > > > manager
>> > > > > > > > > are the ports the nm is running on completely up to the
>> nm?
>> > Ie
>> > > I
>> > > > > can
>> > > > > > > run
>> > > > > > > > my
>> > > > > > > > > nm web server any port, Yarn just explains that to the rm
>> on
>> > > > > > > > registration?
>> > > > > > > > > Because then we need a mechanism at launch of the nm task
>> to
>> > > > > > understand
>> > > > > > > > > which ports mesos has allocated to the nm and update the
>> > > > yarn-site
>> > > > > > for
>> > > > > > > > that
>> > > > > > > > > nm before launch.... Perhaps mesos-dns as a requirement
>> isn't
>> > > > > needed,
>> > > > > > > > but I
>> > > > > > > > > am trying to walk through options that get us closer to
>> > > multiple
>> > > > > yarn
>> > > > > > > > > clusters on a mesos cluster.
>> > > > > > > > >
>> > > > > > > > > John
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Sent from my iThing
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Sent from my iThing
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Recommending or requiring mesos dns?

Reply via email to