Trying to send image again. This time as attachment. Regards Swapnil
On Wed, May 20, 2015 at 5:43 PM, Swapnil Daingade < [email protected]> wrote: > Hi John, > > Are you suggesting something like this ? > > In issue 96 we are proposing something that will not require port mapping. > Can you take a look and give your thoughts > https://github.com/mesos/myriad/issues/96 > > Regards > Swapnil > > > > On Fri, May 15, 2015 at 6:44 AM, John Omernik <[email protected]> wrote: > >> This is true. In this setup thought, we wouldn't be using the "random >> ports" We'd be assigning the ports that will be used by the RM (the 5) per >> cluster (with config changes) a head of time. That is what the RM would >> know as its ports. At this point, when marathon spins up a RM, HA proxy >> would take the service ports (which would be the same ports the RM >> "thinks" >> is running on) and forward them to the ports that mesos has proxied (in >> the >> available ports list). I've done this in Docker, but not on native >> marathon >> run processes. I need to look into that more. >> >> One concern I have with the HAProxy is long running TCP connections (I am >> not sure if this applies to Yarn/RM) Basically on one particular use >> case: >> Running a Hive Thrift (hiveserver2) service in docker on the mesos cluster >> with HAProxy. I found if I submitted a query that was long, that the query >> would be submitted, and HAProxy would not seen connections for a while and >> kill the proxy to the backend. This was annoying to say the least. Would >> this occur with HAProxy? I really think that if the haproxy-marathon >> bridge >> would be used we'd have to be certain that condition wouldn't occur, even >> hidden. (I would hate for something to happen where that condition occurs, >> however, Yarn is able to "reset" without error, adding a bit of latency to >> the process, and have that go unaddressed). >> >> So other than the HAProxy weirdness I saw, that approach could work, and >> then mesos-dns is just a nice component for administrators and users. What >> do I mean by that? >> >> Well, let's say you have a cluster of node1, node2, node3, and node4. >> >> You assign the 5 yarn ports (and service ports) for that cluster to be >> 15000, 15001, 15002, 15003, 15004. >> >> Myriad starts a node manager. It sets in the RM config (and all NM >> configs) the ports based on the 5 above >> >> Mesos grabs 5 random ports in it's allowed range (default 30000 to 31000) >> >> When Mesos starts the RM process, lets say it starts it on node2. >> >> Node2 now has ports 30000,30001,30002,30003,and 30004 listening and is >> forwarding those to 15000,15001,15002,15003, and 15004 on the listening >> process. (Note, I know this is doable with Docker contained processes, >> can >> Marathon do it outside of docker?) >> >> Now haproxy's config is updated. on EVERY node, the ports 15000-15004 are >> listening and are forwarding to Node2 on ports 30000-30004. >> >> To your point on "needing" mesos-dns. Technically no, we don't need it. we >> can tell our NMs to connect to any node on ports 15000-15004. This will >> work. But it's we may get added latency (rack to rack forwarding etc extra >> hops). >> >> Instead, if we set the NMs to connect to myriad-dev-1.marathon.mesos It >> could return an IP that is THE node it's running on. That way we get the >> advantage of having the NMs connect to the box with the process. HA proxy >> takes the requests, and sends to the mesos ports (30000-30004) which Mesos >> then sends to the process on ports 15000-15004. >> >> So without mesos-dns: you just connect to any node on the service ports >> and >> it "works" but when it comes to self documentation, connecting to >> myriad-dev-1.marathon.mesos seems more descriptive than saying the NM is >> on >> node2.yourdomain. Especially when it's not... potential for >> administrative >> confusion. >> >> With mesos-dns, you connect to the descriptive name, and it works. But >> then >> given my concerns with HAProxy, do we even NEED it? All HAProxy is doing >> at >> that point is opening a port on a node, sending to another mesos approved >> port only to send it to the same port the process is listening on. Are we >> adding complexity? >> >> This is a great discussion as it speaks to some intrinsic challenges that >> exist in data center OSes :) >> >> >> . >> >> >> On Thu, May 14, 2015 at 1:50 PM, Santosh Marella <[email protected]> >> wrote: >> >> > I might be missing something, but I didn't understand why mesos-dns >> would >> > be required in addition to HAProxy. If we configure RM to bind to random >> > ports, but have RM reachable via HAProxy on RM's service ports, won't >> all >> > the clients (such as NMs/HiveServer2 etc) just use HAProxy to reach to >> RM? >> > If yes, why is mesos-dns needed? >> > >> > I have very limited knowledge about HAProxy configuration in a mesos >> > cluster. I just read through this doc: >> > https://docs.mesosphere.com/getting-started/service-discovery/ and >> what I >> > inferred is that a HAProxy instance runs on every slave node and if NM >> > running on a slave node has to reach to RM, it would simply use a RM's >> > address that looks like "localhost:99999" (where 99999 is a admin >> > identified RPC service port for RM). >> > Since HAProxy on NM's localhost listens on 99999, it just forwards the >> > traffic to RM's IP:RandomPort. Am I understanding this correctly? >> > >> > Thanks, >> > Santosh >> > >> > On Tue, May 12, 2015 at 5:41 AM, John Omernik <[email protected]> wrote: >> > >> > > The challenge I think is the ports. So we have 5 ports that are needed >> > for >> > > a RM, do we predefine those? I think Yuliya is saying yes, we >> should. An >> > > interesting compromise... rather than truly random ports, when we >> > define a >> > > Yarn cluster, we have the responsibility to define out 5 "service" >> ports >> > > using the Martahon/HA Proxy Service ports. (This now requires HA >> Proxy as >> > > well as mesos-dns. >> > >> > I'd recommend some work being done on documenting >> > > HAProxy for use with the haproxy script, I know that I stumbled a bit >> > > trying to get HAProxy setup, but that just may be my own lack of >> > knowledge >> > > on the subject) These ports will have to be available across the >> cluster, >> > > and will map to whichever ports Mesos Assigns to the RM. >> > > >> > > This makes sense to me, a "Yarn Cluster Creation" event on a Mesos >> > cluster >> > > is something we want to be flexible, but it's not something that will >> > > likely be "self service". I.e. we won't have users just creating Yarn >> > > clusters at will. It will likely be something that, when requested, >> the >> > > Admin can identify 5 available service ports, and lock those into that >> > > cluster... that way when the Yarn RM spins up, it has it's service >> ports >> > > defined (and thus the Node managers always know which ports to connect >> > to). >> > > Combined with Mesos DNS, this could actually work out very well, as >> you >> > can >> > > the name of the RM can be hard coded, and the ports will just work no >> > > matter which node it spins up. >> > > >> > > From an HA perspective, The only advantage at this point that >> > preallocating >> > > the failover RM is speed of recovery. (and guarantee of resources >> being >> > > available if failover occurs). Perhaps we could consider this as an >> > option >> > > for those who need fast or guaranteed recovery but not make it a >> > > requirement? >> > > >> > > The service port method will not work however for the node manager >> ports. >> > > That said, I "believe" that as myriad spins up a node manager, it can >> > > dynamically allocate the ports, and thus report those to the resource >> > > manager on registration. Someone may need to help me out on that one, >> as >> > I >> > > am not sure. Also, since the node manager is host specific, >> mesos-dns is >> > > not required, it can register to the resource manager with what ever >> > ports >> > > are allocated, and the hostname it's running on. I guess the question >> > here >> > > is, when Myriad requests the resources, and mesos allocates the ports, >> > can >> > > myriad, prior to actually starting the node manager, update the >> configs >> > > with the allocated ports? Or is this even needed? >> > > >> > > This is a great discussion. >> > > >> > > On Mon, May 11, 2015 at 9:58 PM, yuliya Feldman >> > > <[email protected] >> > > > wrote: >> > > >> > > > As far as I understand in this case Apache YARN RM HA will kick in - >> > > which >> > > > means all the ids, hosts, ports for all RMs will need to be defined >> > > > somewhere and I wonder how it will be defined in this situation >> since >> > > those >> > > > either need to be in yarn-site.xml or using "-D". >> > > > In case of Mesos-DNS usage no need to setup RM HA at all and no warm >> > > > standby needed. Marathon will start RM somewhere in case of failure >> and >> > > > clients will rediscover it based on the same hostname. >> > > > Am I missing anything? >> > > > From: Adam Bordelon <[email protected]> >> > > > To: [email protected] >> > > > Sent: Monday, May 11, 2015 7:26 PM >> > > > Subject: Re: Recommending or requiring mesos dns? >> > > > >> > > > I'm a +1 for random ports. You can also use Marathon's servicePort >> > field >> > > to >> > > > let HAProxy redirect from the servicePort to the actual hostPort for >> > the >> > > > service on each node. Mesos-DNS will similarly direct you to the >> > correct >> > > > host:port given the appropriate task name. >> > > > >> > > > Is there a reason we can't just have Marathon launch two RM tasks >> for >> > the >> > > > same YARN cluster? One would be the leader, and the other would >> > redirect >> > > to >> > > > it until failover. Once one fails over, the other will start taking >> > > > traffic, and Marathon will try to launch a new backup RM when the >> > > resources >> > > > are available. If the YARN RM cannot provide us this functionality >> on >> > its >> > > > own, perhaps we can write a simple wrapper script for it. >> > > > >> > > > >> > > > >> > > > On Fri, May 8, 2015 at 11:57 AM, John Omernik <[email protected]> >> > wrote: >> > > > >> > > > > I would advocate random ports because there should not be a >> > limitation >> > > > of >> > > > > running only one RM per node. If we want true portability, there >> > > should >> > > > be >> > > > > the ability to have RM for the cluster YarnProd to run to run on >> > node1 >> > > > and >> > > > > also have RM for the cluster YarnDev running on Node1. (if it so >> > > happens >> > > > to >> > > > > land this way). That way the number of clusters isn't limited by >> the >> > > > > number of physical nodes. >> > > > > >> > > > > On Fri, May 8, 2015 at 1:33 PM, Santosh Marella < >> > [email protected] >> > > > >> > > > > wrote: >> > > > > >> > > > > > RM can store its data either in HDFS or in ZooKeeper. The data >> > store >> > > is >> > > > > > configurable. There is a config property in YARN >> > > > > > (yarn.resourcemanager.recovery.enabled) that tells RM whether it >> > > should >> > > > > try >> > > > > > to recover the metadata about the previously submitted apps, the >> > > > > containers >> > > > > > allocated to them etc from the state store. >> > > > > > >> > > > > > Pre allocation of a backup rm is a great idea. Thinking about >> it a >> > > bit >> > > > > > more, I felt it might be better to have such an option >> available in >> > > > > > Marathon rather than building it in Myriad (and in all >> > > > > frameworks/services >> > > > > > that wants HA/failover). >> > > > > > >> > > > > > Let's say we launch a service X via marathon that requires some >> > > > > resources >> > > > > > (cpus/mem/ports) and we want 1 instance of that service to be >> > always >> > > > > > available. Marathon promises restart of the service if it goes >> > down. >> > > > But, >> > > > > > as far as I understand, marathon can restart the service on >> another >> > > > node >> > > > > > only if the resources required by service X are available on >> that >> > > node >> > > > > > *after* the service goes down. In other words, Marathon doesn't >> > > > > proactively >> > > > > > "reserve" these resources on another node as a backup for >> failover. >> > > > > > >> > > > > > Again, not all services launched via Marathon requires this, but >> > > > perhaps >> > > > > > there should be an config option to specify if a service >> desires to >> > > > have >> > > > > > marathon keep a backup node ready-to-go in the event of failure. >> > > > > > >> > > > > > >> > > > > > On Thu, May 7, 2015 at 4:12 PM, John Omernik <[email protected]> >> > > wrote: >> > > > > > >> > > > > > > So I may be lookng at this wrong, but where is the data for >> the >> > rm >> > > > > stored >> > > > > > > if it does fail over? How will it know to pick up where it >> left >> > > off? >> > > > > This >> > > > > > >> > > > > > is just one area I am low in understanding on. >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > > That said, what about pre allocating a second failover rm >> some >> > > where >> > > > > on >> > > > > > > the cluster. (I am just tossing an idea here, in that there >> are >> > > > > probably >> > > > > > > many reasons not to do this) but here is how I could see it >> > > > happening. >> > > > > > > >> > > > > > 1. Myriad starts a rm asking for 5 random available ports. >> Mesos >> > > > replies >> > > > > > > starting the rm and reports to myriad the 5 ports used for the >> > > > services >> > > > > > you >> > > > > > > listed below. >> > > > > > > >> > > > > > > 2. Myriad then checks a config value of number of "hot spares" >> > lets >> > > > say >> > > > > > we >> > > > > > > specify 1. Myriad then puts in a resource request to mesos for >> > CPU >> > > > and >> > > > > > > memory required for the rm, but specifically asks for the >> same 5 >> > > > ports >> > > > > > > allocated to the first. Basically it reserves a spot on >> another >> > > node >> > > > > with >> > > > > > > the same ports available. It may tak a bit, but there should >> be >> > > that >> > > > > > > availability. Until this request is met, the yarn cluster is >> in a >> > > ha >> > > > > > > compromised position. >> > > > > > > >> > > > > > >> > > > > > This is exactly what I think we should do, but why use random >> > > ports >> > > > > > instead of standard RM ports? If you have 10 slave nodes in your >> > > mesos >> > > > > > cluster, then there are 10 potential spots for RM to be launched >> > on. >> > > > > > However, if you choose to launch multiple RMs (multiple YARN >> > > clusters), >> > > > > > then you can probably launch utmost 5 (with remaining 5 nodes >> > > available >> > > > > > >> > > > > > > >> > > > > > > 3. At this point the perhaps we start another instance of rm >> > right >> > > > away >> > > > > > > (depends on my first question on where the rm stores into >> about >> > > > > > > jobs/applications) or the frame work just holds the spot, >> waiting >> > > > for a >> > > > > > > lack of heart beat (failover condition) on the primay resource >> > > > manager. >> > > > > > > >> > > > > > > 4. If we can run the spare with no issues, it's a simple >> update >> > of >> > > > the >> > > > > > dns >> > > > > > > record and node managers connect to the new rm ( and another >> rm >> > is >> > > > > > > preallocated for redundancy). If we can't actually execute the >> > > > > secondary >> > > > > > rm >> > > > > > > until failover conditions, we can now execute the new rm, and >> the >> > > > ports >> > > > > > > will be the same. >> > > > > > > >> > > > > > > This may seem kludgey at first, but done correctly, it may >> > actually >> > > > > limit >> > > > > > > the length of failover time as the rm is preallocated. Rms >> are >> > not >> > > > > huge >> > > > > > > from a resource perspective thus it may be a small cost for >> those >> > > who >> > > > > > want >> > > > > > > failover and multiple clusters (thus having dynamic ports) >> > > > > > > >> > > > > > > I will keep thinking this through, and would welcome feedback. >> > > > > > > >> > > > > > > On Thursday, May 7, 2015, Santosh Marella < >> [email protected] >> > > >> > > > > wrote: >> > > > > > > >> > > > > > > > Hi John, >> > > > > > > > >> > > > > > > > Great views about extending mesos dns for rm's discovery. >> Some >> > > > > > > thoughts: >> > > > > > > > 1. There are 5 primary interfaces RM exposes that are >> bound >> > to >> > > > > > > standard >> > > > > > > > ports. >> > > > > > > > a. RPC interface for clients that want to submit >> > > > applications >> > > > > > to >> > > > > > > > YARN (port 8032). >> > > > > > > > b. RPC interface for NMs to connect back/HB to RM >> (port >> > > > > 8031). >> > > > > > > > c. RPC interface for App Masters to connect back/HB >> to >> > RM >> > > > > (port >> > > > > > > > 8030). >> > > > > > > > d. RPC interface for admin to interact with RM via >> CLI >> > > (port >> > > > > > > 8033). >> > > > > > > > e. Web Interface for RM's UI (port 8088). >> > > > > > > > 2. When we launch RM using Marathon, it's probably >> better to >> > > > > mention >> > > > > > > in >> > > > > > > > marathon's config that RM will use the above ports. This is >> > > > because, >> > > > > if >> > > > > > > RM >> > > > > > > > doesn't listens on random ports (as opposed to the above >> listed >> > > > > > standard >> > > > > > > > ports), when RM fails over, the new RM gets ports that >> might be >> > > > > > different >> > > > > > > > from the ones used by the old RM. This makes the RM's >> discovery >> > > > hard, >> > > > > > > > especially post failover. >> > > > > > > > 3. It looks like what you are proposing is a way to >> update >> > > > > mesos-dns >> > > > > > > as >> > > > > > > > to what ports RM's services are listening on. And when RM >> fails >> > > > over, >> > > > > > > these >> > > > > > > > ports would get updated in mesos-dns. Is my understanding >> > > correct? >> > > > If >> > > > > > > yes, >> > > > > > > > one challenge I see is that the clients that want to >> connect to >> > > the >> > > > > > above >> > > > > > > > listed RM interfaces also need to pull the changes to RM's >> port >> > > > > numbers >> > > > > > > > from mesos-dns dynamically. Not sure how that might be >> > possible. >> > > > > > > > >> > > > > > > > Regarding your question about NM ports >> > > > > > > > 1. NM has the following ports: >> > > > > > > > a. RPC port for app masters to launch containers (this >> is >> > a >> > > > > > random >> > > > > > > > port). >> > > > > > > > b. RPC port for localization service. (port 8040) >> > > > > > > > c. Web port for NM's UI (port 8042). >> > > > > > > > 2. Ports (a) and (c) are relayed to RM when NM registers >> > with >> > > > RM. >> > > > > > Port >> > > > > > > > (b) is passed to a local container executor process via >> command >> > > > line >> > > > > > > args. >> > > > > > > > 3. As you rightly reckon, we need a mechanism at launch >> of >> > NM >> > > to >> > > > > > pass >> > > > > > > > the mesos allocated ports to NM for the above interfaces. We >> > can >> > > > try >> > > > > > > > to use variable >> > > > > > > > expansion >> > > > > > > > < >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/conf/Configuration.html >> > > > > > > > > >> > > > > > > > mechanism hadoop has to achieve this. >> > > > > > > > >> > > > > > > > Thanks, >> > > > > > > > Santosh >> > > > > > > > >> > > > > > > > On Thu, May 7, 2015 at 3:51 AM, John Omernik < >> [email protected] >> > > > > > > > <javascript:;>> wrote: >> > > > > > > > >> > > > > > > > > I've implemented mesos-dns and use marathon to launch my >> > myriad >> > > > > > > > framework. >> > > > > > > > > It shows up as myriad.marahon.mesos and makes it easy to >> find >> > > > what >> > > > > > node >> > > > > > > > the >> > > > > > > > > framework launched the resource manager on. >> > > > > > > > > >> > > > > > > > > What if we made myriad mesos-dns aware, and prior to >> > launching >> > > > the >> > > > > > > yarn >> > > > > > > > > rm, it could register in mesos dns. This would mean both >> the >> > ip >> > > > > > > addresses >> > > > > > > > > and the ports (we need to figure out multiple ports in >> > > > mesos-dns). >> > > > > > Then >> > > > > > > > it >> > > > > > > > > could write out ports and host names in the nm configs by >> > > > checking >> > > > > > > mesos >> > > > > > > > > dns for which ports the resource manager is using. >> > > > > > > > >> > > > > > > > >> > > > > > > > > Side question: when a node manager registers with the >> > resource >> > > > > > manager >> > > > > > > > > are the ports the nm is running on completely up to the >> nm? >> > Ie >> > > I >> > > > > can >> > > > > > > run >> > > > > > > > my >> > > > > > > > > nm web server any port, Yarn just explains that to the rm >> on >> > > > > > > > registration? >> > > > > > > > > Because then we need a mechanism at launch of the nm task >> to >> > > > > > understand >> > > > > > > > > which ports mesos has allocated to the nm and update the >> > > > yarn-site >> > > > > > for >> > > > > > > > that >> > > > > > > > > nm before launch.... Perhaps mesos-dns as a requirement >> isn't >> > > > > needed, >> > > > > > > > but I >> > > > > > > > > am trying to walk through options that get us closer to >> > > multiple >> > > > > yarn >> > > > > > > > > clusters on a mesos cluster. >> > > > > > > > > >> > > > > > > > > John >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > Sent from my iThing >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Sent from my iThing >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > > >> > > >> > >> > >
