Great Questions Dave!

Here is what I think
1. The current way an organization would run Yarn and Mesos is to have two
separate clusters with dedicated resources (compute, storage, networking).
This, I feel, is not the best use of the clusters as resources cannot be
shared. You could have one cluster starved for resources while the other
remains idle.

Better resource sharing and utilization
With Myriad, we delegate all Yarn resource management to Mesos. You can
have one Mesos cluster, one instance of the DFS (HDFS, MapRFS etc) where
you can run any Mesos application (including Yarn).
For E.g. Imagine you are running a bunch of webservers during the day and
you want to analyse the web server logs at night using yarn when the web
traffic is less. You can reduce the web server instances, returning back
resources to Mesos which in turn could be used for launching and expanding
a yarn cluster.

Multi tenancy
Things become more interesting when you launch multiple tenant yarn
clusters on Mesos which can expand and contract dynamically along with
other mesos applications. Lets extend the earlier example. You still have
one physical cluster running Mesos and the DFS. Resources are being shared
between webservers and two tenant yarn clusters (say one for the
Engineering department and another for Finance). At the end of a quarter,
the finance department could be allocated more resources by shutting down a
few nodemanagers from the engineering yarn cluster, or a few web servers
and launching new nodemanagers for the finance yarn cluster.

Yarn as a service
Say your org grows big and you need more yarn clusters (dev, test, prod,
finance). You can still have one physical cluster running Mesos and single
DFS instance. The physical cluster scales as you add new nodes to it. The
new resources become available to all the Mesos applications including
webservers, multiple tenant yarn clusters. Each individual tenant yarn
cluster can expand and shrink dynamically. You might also want to run
multiple versions of yarn in your clusters (say 2.7 in prod vs 3.0 in dev).
One could easily do this using docker or binary distribution. There are
other ideas about complete isolation (compute, storage, networking) between
yarn clusters using docker that have been floated around.

Fine grained scaling
Another cool thing that Myriad provides is fine grained scaling (fgs for
short). With fgs, each yarn cluster has certain guaranteed capacity.
However it can also utilize resources beyond its guaranteed capacity if
they are available in the cluster. This improves resource utilization when
the physical cluster is lightly loaded.

2. Not many that I can think of. If MapReduce is all that you need to run,
then MR1 on Mesos might work well. You might not have access to fgs. I
think one of the reasons that YARN came into being was because the Hadoop
community wanted to separate the cluster resource management part from the
application part (MapReduce). With Myriad, any new application that runs on
Yarn also automatically runs on Mesos without the need for a Mesos
framework to be written for it.

3. I feel data locality needs to be better addressed. There are multiple
ideas that have been floated around.
Like running multiple nodemanagers per node in fgs mode to get advantage of
data locality etc

4. I think this will be more clear when 3 is addressed.

Hope this help.

Regards
Swapnil


On Mon, May 30, 2016 at 12:02 AM, Dave Webb <dave.w...@gmx.de> wrote:

> Hi,
>
> I have read about Mesos [1], Yarn [2] and Myriad, but I couldn't find an
> explicit answer to a few general questions. First of all, I don't have an
> actual cluster with a business usecase to solve, but I'm interested in the
> technologies and motivation behind these systems.
>
> From my understanding Myriad is a Mesos Framework (just like Marathon,
> Spark, ...) which acts as a "wrapper" around Yarn. This enables a dynamic
> coexistence of Yarn and Mesos on the same cluster which was originally not
> possible.
> However, from a theoretical standpoint, Yarn and Mesos appear to be - in
> general - only different variations of the same thing: Resource Negotiators
> in a cluster environment.
> This leads to the first question:
>
> (1) Why would you want to run Mesos and Yarn together?
> What would be the disadvantages of choosing only one of them?
> One valid argument might be that there are Mesos Frameworks / Yarn
> Applications which you don't want to port to Yarn / Mesos and vice versa.
> Myriad would allow you to use Mesos (and all frameworks built for it), but
> still use all Yarn applications.
>
> Nevertheless, in many cases I would suspect that even though there surely
> are interesting Yarn applications, the most prominent example is MapReduce.
> However, MapReduce v1 has been ported to a Mesos Framework [1, 3] several
> years ago.
> This leads to the second question:
>
> (2) What are the advantages of running MapReduce v2 using Yarn via Myriad
> on a Mesos Cluster instead of running the MapReduce v1 framework directly
> on Mesos?
> One might argue that the first option sounds like more overhead, but as
> MapReduce is typically batch oriented this argument might not stand too
> well.
>
> Due to the different strategies of Mesos (offer oriented) and Yarn
> (request oriented), one question regarding applications which require data
> locality (e.g. MapReduce) pops up:
>
> (3) How can Myriad provide good data locality for applications with high
> dependence on data locality?
> As the underlying Mesos system negotiates resources via offers, it seems
> that a framework has few possibilities aside from waiting for matching
> offers. Is this the strategy Myriad employs?
>
> And this leads to my final question:
> (4) How does Yarn via Myriad on Mesos compare to Yarn "alone"?
> Have there been studies about Myriad, potentially with such evaluations,
> yet?
>
> I'm grateful for any input, Thank you very much!
>
> Cheers,
> Dave
>
> [1]
> http://static.usenix.org/events/nsdi11/tech/full_papers/Hindman_new.pdf
> [2]
> https://www.sics.se/~amir/files/download/dic/2013%20-%20Apache%20Hadoop%20YARN:%20Yet%20Another%20Resource%20Negotiator%20(SoCC).pdf
> [3] http://myriad.incubator.apache.org/
> [4] https://github.com/mesos/hadoop
>

Reply via email to