Re: mesos and moving jobs between clusters

Pankaj Saha Tue, 25 Oct 2016 15:10:04 -0700

Hi Mark,

Mesos collects the resource information from all the nodes in the cluster
(cores, memory, disk, and gpu) and presents a unified view, as if it is a
single operating system. The Mesosphere, who a commercial entity for Mesos,
has built an ecosystem around Mesos as the kernel called the "Data Center
Operating System (DCOS)".  Frameworks interact  with Mesos to reserve
resources and then use these resources to run jobs on the cluster. So, for
example, if multiple frameworks such as Marathon, Apache Aurora, and a
custom-MPI-framework are using Mesos, then there is a negotiation between
Mesos and each framework on how many resources each framework gets. Once
the framework, say Aurora, gets resources, it can decide how to use those
resources. Some of the strengths of Mesos include fault tolerance at scale
and the ability to co-schedule applications/frameworks on the cluster such
that cluster utilization is high.

Mesos off-the-shelf only works when the Mater and agent nodes have a line
of communication to each other. We have worked on modifying the Mesos
installation so that it even works when agents are behind firewalls on
campus clusters. We are also working on getting the same setup to work on
Jetstream and Chameleon where allocations are a mix of public IPs and
internally accessible nodes. This will allow us to use Mesos to
meta-schedule across clusters. We are also developing our own framework, to
be able to customize scheduling and resource negotiations for science
gateways on Mesos clusters. Our plan is to work with Suresh and Marlon's
team so that it works with Airavata.

I will be presenting at the Gateways workshop in November, and then I will
also be at SC along with my adviser (Madhu Govindaraju), if you would like
to discuss any of these projects.

We are working on packaging our work so that it can be shared with this
community.

Thanks

Pankaj

On Tue, Oct 25, 2016 at 11:36 AM, Mangirish Wagle <[email protected]>
wrote:

> Hi Mark,
>
> Thanks for your question. So if I understand you correctly, you need kind
> of load balancing between identical clusters through a single Mesos master?
>
> With the current setup, from what I understand, we have a separate mesos
> masters for every cluster on separate clouds. However, its a good
> investigative topic if we can have single mesos master targeting multiple
> identical clusters. We have some work ongoing to use a virtual cluster
> setup with compute resources across clouds to install mesos, but not sure
> if that is what you are looking for.
>
> Regards,
> Mangirish
>
>
>
>
> On Tue, Oct 25, 2016 at 11:05 AM, Miller, Mark <[email protected]> wrote:
>
>> Hi all,
>>
>>
>>
>> I posed a question to Suresh (see below), and he asked me to put this
>> question on the dev list.
>>
>> So here it is. I will be grateful for any comments about the issues you
>> all are facing, and what has come up in trying this, as
>>
>> It seems likely that this is a much simpler problem in concept than it is
>> in practice, but its solution has many benefits.
>>
>>
>>
>> Here is my question:
>>
>> A group of us have been discussing how we might simplify submitting jobs
>> to different compute resources in our current implementation of CIPRES, and
>> how cloud computing might facilitate this. But none of us are cloud
>> experts. As I understand it, the mesos cluster that I have been seeing in
>> the Airavata email threads is intended to make it possible to deploy jobs
>> to multiple virtual clusters. I am (we are) wondering if Mesos manages
>> submissions to identical virtual clusters on multiple machines, and if that
>> works efficiently.
>>
>>
>>
>> In our implementation, we have to change the rules to run efficiently on
>> different machines, according to gpu availability, and cores per node. I am
>> wondering how Mesos/ virtual clusters affect those considerations.
>>
>> Can mesos create basically identical virtual clusters independent of
>> machine?
>>
>>
>> Thanks for any advice.
>>
>>
>>
>> Mark
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: mesos and moving jobs between clusters

Reply via email to