What do you mean by "meta-scheduler" here? Are you trying to coordinate running of jobs across or amongst a number of different clusters?
On Fri, Sep 23, 2016 at 08:43:19PM +0000, Shenoy, Gourav Ganesh wrote: > Hi Dev, > > I am working on this project of building a Mesos based meta-scheduler for > Airavata, along with Shameera & Mangirish. Here is the jira link: > https://issues.apache.org/jira/browse/AIRAVATA-2082. > > > · We have identified some tasks that would be needed for achieving > this, and at the higher level it would consist of: > > 1. Resource provisioning – We need to provision resources on cloud & > hpc infrastructures such as EC2, Jetstream, Comet, etc. > > 2. Building a cluster – Deploying a Mesos cluster on set of nodes > obtained from (1) above for task management. > > 3. Selecting a scheduler – We need to investigate the scheduler to use > with Mesos cluster. Some of the options are Marathon, Aurora. But we need to > find one that suits our needs of running serial as well as parallel (MPI) > jobs. > > 4. Installing & running applications on this cluster – Once the cluster > has been deployed and a scheduler choice made, we need to be able to install > and run applications on this cluster using Airavata. > > > · Until now we were able to look into the following: > > o Resource provisioning: > > § We explored several options of provisioning resources – using cloud > libraries as well as via ansible scripts. > > § We built a OpenStack4J Java module which would provision instances on > OpenStack based clouds (eg: Jetstream). > > § We also built a CloudBridge Python module for provisioning EC2 instances > on Amazon. CloudBridge can also be used to provision instances on OpenStack > > § We wrote Ansible scripts for bringing up instances on both AWS and > OpenStack based clouds. > > > § Key Points: CloudBridge, OpenStack4J are powerful libraries for resource > provisioning, but currently they do single-instance provisioning, and not > support templated boot options such as CloudFormation (for AWS) & Heat (for > OpenStack). > > > o Building a cluster: > > § We wrote Ansible script for deploying a Mesos-Marathon cluster on a set of > nodes. This script will install necessary dependencies such as Zookeeper. > > § We tested this on OpenStack based clouds & on EC2. > > § OpenStack Magnum provides excellent support for doing resource > provisioning & deploying mesos cluster, but we are running into some problems > while trying it. > > > o Installing a scheduler: > > § Our Ansible script is currently installing Marathon as the scheduler on > Mesos. We haven’t yet submitted jobs using Marathon. > > > · Although not finalized, but we are inclined towards using Ansible > approach for the above, as Ansible also provides Python APIs and which will > allow us to integrate it with Airavata via Thrift. Hence we will be able to > easily invoke the Ansible scripts from code without needing to use the > command-line interface. > > > · We are also progressively working on some work-items such as: > > o Exploring options to provision and deploy a Mesos-Marathon cluster on > HPC systems such as Comet. The challenge would be to use Ansible to provision > resources and deploy the cluster. Once we have a cluster, we can try running > applications. > > o Exploring different scheduler options for running serial and parallel > (MPI) jobs on such heterogeneous clusters. > > o Exploring orchestration options such as OpenStack Heat, AWS > CloudFormation, OpenStack Magnum, etc. > > Any suggestions and comments are highly appreciated. > > Thanks and Regards, > Gourav Shenoy > >
