Re: Questions about Myriad background

Darin Johnson Tue, 31 May 2016 07:05:10 -0700

Dave, I see Swapnil answered but I thought I'd give you my perspective as
well.

(1)  Why would you want to run Mesos and Yarn together?
While YARN is THE Scheduler for data-driven apps, Map/Reduce, Hive, Tez,
Spark, Giraph etc.  However, it is very opinionated the only resources are
cpu, memory and disk, this makes it difficult to run other applications one
might like to run near their data without dedicated hardware.  These my
include Jupyter notebooks, Tensorflow, R etc.  Using Myriad and Mesos
enables both the dataprocessing workloads of YARN, combined with the
flexibility to run other applications via Mesos (usually using Marathon or
Chronos).

(2) What are the advantages of running MapReduce v2 using Yarn via Myriad
on a Mesos Cluster instead of running the MapReduce v1 framework directly
on Mesos?
If we go beyond MapReduce v2 and extend to YARN, we get the benefit of all
YARN frameworks, Flink, Tez, Giraph, etc.  no need to rewrite the
schedulers.  From purely MapReduce v2 vs MapReduce v1, there's the
advantage of being able to restart the Resource Manager (Analogous to the
JobTracker) without stopping all MapReduce Jobs.  Also the MapReduce v1
framework is pretty much dead and has a few major issues I won't go into,
unless you're interested.

(3) How can Myriad provide good data locality for applications with high
dependence on data locality?
If not using FGS, the data locality it pretty much the same as YARN.  Using
FGS,  Swapnil answered this pretty well.

(4) How does Yarn via Myriad on Mesos compare to Yarn "alone"?
I've done some work on this, own major issue is when using FGS and sort
lived Map tasks (<30 secs) Myriad can't get offers quickly enough to launch
containers to fully utilize the cluster.  This isn't an issue for
traditional MapReduce Jobs, but does effect tools live Hive.  The's an open
JIRA to address this (
https://issues.apache.org/jira/browse/MYRIAD-199?jql=project%20%3D%20MYRIAD
).

Hope this helps,
Darin

On Mon, May 30, 2016 at 3:02 AM, Dave Webb <dave.w...@gmx.de> wrote:

> Hi,
>
> I have read about Mesos [1], Yarn [2] and Myriad, but I couldn't find an
> explicit answer to a few general questions. First of all, I don't have an
> actual cluster with a business usecase to solve, but I'm interested in the
> technologies and motivation behind these systems.
>
> From my understanding Myriad is a Mesos Framework (just like Marathon,
> Spark, ...) which acts as a "wrapper" around Yarn. This enables a dynamic
> coexistence of Yarn and Mesos on the same cluster which was originally not
> possible.
> However, from a theoretical standpoint, Yarn and Mesos appear to be - in
> general - only different variations of the same thing: Resource Negotiators
> in a cluster environment.
> This leads to the first question:
>
> (1) Why would you want to run Mesos and Yarn together?
> What would be the disadvantages of choosing only one of them?
> One valid argument might be that there are Mesos Frameworks / Yarn
> Applications which you don't want to port to Yarn / Mesos and vice versa.
> Myriad would allow you to use Mesos (and all frameworks built for it), but
> still use all Yarn applications.
>
> Nevertheless, in many cases I would suspect that even though there surely
> are interesting Yarn applications, the most prominent example is MapReduce.
> However, MapReduce v1 has been ported to a Mesos Framework [1, 3] several
> years ago.
> This leads to the second question:
>
> (2) What are the advantages of running MapReduce v2 using Yarn via Myriad
> on a Mesos Cluster instead of running the MapReduce v1 framework directly
> on Mesos?
> One might argue that the first option sounds like more overhead, but as
> MapReduce is typically batch oriented this argument might not stand too
> well.
>
> Due to the different strategies of Mesos (offer oriented) and Yarn
> (request oriented), one question regarding applications which require data
> locality (e.g. MapReduce) pops up:
>
> (3) How can Myriad provide good data locality for applications with high
> dependence on data locality?
> As the underlying Mesos system negotiates resources via offers, it seems
> that a framework has few possibilities aside from waiting for matching
> offers. Is this the strategy Myriad employs?
>
> And this leads to my final question:
> (4) How does Yarn via Myriad on Mesos compare to Yarn "alone"?
> Have there been studies about Myriad, potentially with such evaluations,
> yet?
>
> I'm grateful for any input, Thank you very much!
>
> Cheers,
> Dave
>
> [1]
> http://static.usenix.org/events/nsdi11/tech/full_papers/Hindman_new.pdf
> [2]
> https://www.sics.se/~amir/files/download/dic/2013%20-%20Apache%20Hadoop%20YARN:%20Yet%20Another%20Resource%20Negotiator%20(SoCC).pdf
> [3] http://myriad.incubator.apache.org/
> [4] https://github.com/mesos/hadoop
>

Re: Questions about Myriad background

Reply via email to