Great Questions Dave! Here is what I think 1. The current way an organization would run Yarn and Mesos is to have two separate clusters with dedicated resources (compute, storage, networking). This, I feel, is not the best use of the clusters as resources cannot be shared. You could have one cluster starved for resources while the other remains idle.
Better resource sharing and utilization With Myriad, we delegate all Yarn resource management to Mesos. You can have one Mesos cluster, one instance of the DFS (HDFS, MapRFS etc) where you can run any Mesos application (including Yarn). For E.g. Imagine you are running a bunch of webservers during the day and you want to analyse the web server logs at night using yarn when the web traffic is less. You can reduce the web server instances, returning back resources to Mesos which in turn could be used for launching and expanding a yarn cluster. Multi tenancy Things become more interesting when you launch multiple tenant yarn clusters on Mesos which can expand and contract dynamically along with other mesos applications. Lets extend the earlier example. You still have one physical cluster running Mesos and the DFS. Resources are being shared between webservers and two tenant yarn clusters (say one for the Engineering department and another for Finance). At the end of a quarter, the finance department could be allocated more resources by shutting down a few nodemanagers from the engineering yarn cluster, or a few web servers and launching new nodemanagers for the finance yarn cluster. Yarn as a service Say your org grows big and you need more yarn clusters (dev, test, prod, finance). You can still have one physical cluster running Mesos and single DFS instance. The physical cluster scales as you add new nodes to it. The new resources become available to all the Mesos applications including webservers, multiple tenant yarn clusters. Each individual tenant yarn cluster can expand and shrink dynamically. You might also want to run multiple versions of yarn in your clusters (say 2.7 in prod vs 3.0 in dev). One could easily do this using docker or binary distribution. There are other ideas about complete isolation (compute, storage, networking) between yarn clusters using docker that have been floated around. Fine grained scaling Another cool thing that Myriad provides is fine grained scaling (fgs for short). With fgs, each yarn cluster has certain guaranteed capacity. However it can also utilize resources beyond its guaranteed capacity if they are available in the cluster. This improves resource utilization when the physical cluster is lightly loaded. 2. Not many that I can think of. If MapReduce is all that you need to run, then MR1 on Mesos might work well. You might not have access to fgs. I think one of the reasons that YARN came into being was because the Hadoop community wanted to separate the cluster resource management part from the application part (MapReduce). With Myriad, any new application that runs on Yarn also automatically runs on Mesos without the need for a Mesos framework to be written for it. 3. I feel data locality needs to be better addressed. There are multiple ideas that have been floated around. Like running multiple nodemanagers per node in fgs mode to get advantage of data locality etc 4. I think this will be more clear when 3 is addressed. Hope this help. Regards Swapnil On Mon, May 30, 2016 at 12:02 AM, Dave Webb <dave.w...@gmx.de> wrote: > Hi, > > I have read about Mesos [1], Yarn [2] and Myriad, but I couldn't find an > explicit answer to a few general questions. First of all, I don't have an > actual cluster with a business usecase to solve, but I'm interested in the > technologies and motivation behind these systems. > > From my understanding Myriad is a Mesos Framework (just like Marathon, > Spark, ...) which acts as a "wrapper" around Yarn. This enables a dynamic > coexistence of Yarn and Mesos on the same cluster which was originally not > possible. > However, from a theoretical standpoint, Yarn and Mesos appear to be - in > general - only different variations of the same thing: Resource Negotiators > in a cluster environment. > This leads to the first question: > > (1) Why would you want to run Mesos and Yarn together? > What would be the disadvantages of choosing only one of them? > One valid argument might be that there are Mesos Frameworks / Yarn > Applications which you don't want to port to Yarn / Mesos and vice versa. > Myriad would allow you to use Mesos (and all frameworks built for it), but > still use all Yarn applications. > > Nevertheless, in many cases I would suspect that even though there surely > are interesting Yarn applications, the most prominent example is MapReduce. > However, MapReduce v1 has been ported to a Mesos Framework [1, 3] several > years ago. > This leads to the second question: > > (2) What are the advantages of running MapReduce v2 using Yarn via Myriad > on a Mesos Cluster instead of running the MapReduce v1 framework directly > on Mesos? > One might argue that the first option sounds like more overhead, but as > MapReduce is typically batch oriented this argument might not stand too > well. > > Due to the different strategies of Mesos (offer oriented) and Yarn > (request oriented), one question regarding applications which require data > locality (e.g. MapReduce) pops up: > > (3) How can Myriad provide good data locality for applications with high > dependence on data locality? > As the underlying Mesos system negotiates resources via offers, it seems > that a framework has few possibilities aside from waiting for matching > offers. Is this the strategy Myriad employs? > > And this leads to my final question: > (4) How does Yarn via Myriad on Mesos compare to Yarn "alone"? > Have there been studies about Myriad, potentially with such evaluations, > yet? > > I'm grateful for any input, Thank you very much! > > Cheers, > Dave > > [1] > http://static.usenix.org/events/nsdi11/tech/full_papers/Hindman_new.pdf > [2] > https://www.sics.se/~amir/files/download/dic/2013%20-%20Apache%20Hadoop%20YARN:%20Yet%20Another%20Resource%20Negotiator%20(SoCC).pdf > [3] http://myriad.incubator.apache.org/ > [4] https://github.com/mesos/hadoop >