Re: mesos and moving jobs between clusters

Madhusudhan Govindaraju Tue, 01 Nov 2016 06:35:31 -0700


Hello Mangirish,


Here is the text from Aurora's github page:

---
When and when not to use Aurora

Aurora can take over for most uses of software ... However, if you havevery specific scheduling requirements, or are building a system thatlooks like a scheduler itself, you may want to explore developing yourownframework.

---

We believe Airavata will need a framework to customize schedulingpolicies for different communities, and so instead of making big changesin Aurora, we want to develop our own framework. Once you orGourav-Shenoy have Airavata working with Aurora/Mesos, the idea is thatPankaj will work with you to use the same codebase/task-module inAiravata to launch jobs on Mesos using a custom framework.


-Madhu



On 10/28/2016 12:46 PM, Mangirish Wagle wrote:

Hi Pankaj,

I was curious to know what is your motivation to work on developing acustom framework and not use Aurora or any existing robust frameworks.It would be great if you could share some pointers on that.I would also like to know what specific use cases you are targetingthrough your framework, as well as what are various stability concernsthat you may have identified and how are you planning to handle them?


Regards,
Mangirish

On Tue, Oct 25, 2016 at 6:09 PM, Pankaj Saha <[email protected]<mailto:[email protected]>> wrote:


    Hi Mark,


    Mesos collects the resource information from all the nodes in the
    cluster (cores, memory, disk, and gpu) and presents a unified
    view, as if it is a single operating system. The Mesosphere, who a
    commercial entity for Mesos, has built an ecosystem around Mesos

as the kernel called the "Data Center Operating System (DCOS)".Frameworks interact with Mesos to reserve resources and then use

    these resources to run jobs on the cluster. So, for example, if
    multiple frameworks such as Marathon, Apache Aurora, and a
    custom-MPI-framework are using Mesos, then there is a negotiation
    between Mesos and each framework on how many resources each
    framework gets. Once the framework, say Aurora, gets resources, it
    can decide how to use those resources. Some of the strengths of
    Mesos include fault tolerance at scale and the ability to
    co-schedule applications/frameworks on the cluster such that
    cluster utilization is high.


    Mesos off-the-shelf only works when the Mater and agent nodes have
    a line of communication to each other. We have worked on modifying
    the Mesos installation so that it even works when agents are
    behind firewalls on campus clusters. We are also working on
    getting the same setup to work on Jetstream and Chameleon where
    allocations are a mix of public IPs and internally accessible
    nodes. This will allow us to use Mesos to meta-schedule across
    clusters. We are also developing our own framework, to be able to
    customize scheduling and resource negotiations for science
    gateways on Mesos clusters. Our plan is to work with Suresh and
    Marlon's team so that it works with Airavata.


    I will be presenting at the Gateways workshop in November, and
    then I will also be at SC along with my adviser (Madhu
    Govindaraju), if you would like to discuss any of these projects.


    We are working on packaging our work so that it can be shared with
    this community.

    Thanks

    Pankaj


    On Tue, Oct 25, 2016 at 11:36 AM, Mangirish Wagle
    <[email protected] <mailto:[email protected]>> wrote:

        Hi Mark,

        Thanks for your question. So if I understand you correctly,
        you need kind of load balancing between identical clusters
        through a single Mesos master?

        With the current setup, from what I understand, we have a
        separate mesos masters for every cluster on separate clouds.
        However, its a good investigative topic if we can have single
        mesos master targeting multiple identical clusters. We have
        some work ongoing to use a virtual cluster setup with compute
        resources across clouds to install mesos, but not sure if that
        is what you are looking for.

        Regards,
        Mangirish




        On Tue, Oct 25, 2016 at 11:05 AM, Miller, Mark
        <[email protected] <mailto:[email protected]>> wrote:

            Hi all,

            I posed a question to Suresh (see below), and he asked me
            to put this question on the dev list.

            So here it is. I will be grateful for any comments about
            the issues you all are facing, and what has come up in
            trying this, as

            It seems likely that this is a much simpler problem in
            concept than it is in practice, but its solution has many
            benefits.

            Here is my question:

            A group of us have been discussing how we might simplify
            submitting jobs to different compute resources in our
            current implementation of CIPRES, and how cloud computing
            might facilitate this. But none of us are cloud experts.
            As I understand it, the mesos cluster that I have been
            seeing in the Airavata email threads is intended to make
            it possible to deploy jobs to multiple virtual clusters. I
            am (we are) wondering if Mesos manages submissions to
            identical virtual clusters on multiple machines, and if
            that works efficiently.

            In our implementation, we have to change the rules to run
            efficiently on different machines, according to gpu
            availability, and cores per node. I am wondering how
            Mesos/ virtual clusters affect those considerations.

            Can mesos create basically identical virtual clusters
            independent of machine?


            Thanks for any advice.

            Mark

Re: mesos and moving jobs between clusters

Reply via email to