What do you mean by "meta-scheduler" here?  Are you trying to
coordinate running of jobs across or amongst a number of different
clusters?

On Fri, Sep 23, 2016 at 08:43:19PM +0000, Shenoy, Gourav Ganesh wrote:
> Hi Dev,
> 
> I am working on this project of building a Mesos based meta-scheduler for 
> Airavata, along with Shameera & Mangirish. Here is the jira link: 
> https://issues.apache.org/jira/browse/AIRAVATA-2082.
> 
> 
> ·         We have identified some tasks that would be needed for achieving 
> this, and at the higher level it would consist of:
> 
> 1.       Resource provisioning – We need to provision resources on cloud & 
> hpc infrastructures such as EC2, Jetstream, Comet, etc.
> 
> 2.       Building a cluster – Deploying a Mesos cluster on set of nodes 
> obtained from (1) above for task management.
> 
> 3.       Selecting a scheduler – We need to investigate the scheduler to use 
> with Mesos cluster. Some of the options are Marathon, Aurora. But we need to 
> find one that suits our needs of running serial as well as parallel (MPI) 
> jobs.
> 
> 4.       Installing & running applications on this cluster – Once the cluster 
> has been deployed and a scheduler choice made, we need to be able to install 
> and run applications on this cluster using Airavata.
> 
> 
> ·         Until now we were able to look into the following:
> 
> o    Resource provisioning:
> 
> §  We explored several options of provisioning resources – using cloud 
> libraries as well as via ansible scripts.
> 
> §  We built a OpenStack4J Java module which would provision instances on 
> OpenStack based clouds (eg: Jetstream).
> 
> §  We also built a CloudBridge Python module for provisioning EC2 instances 
> on Amazon. CloudBridge can also be used to provision instances on OpenStack
> 
> §  We wrote Ansible scripts for bringing up instances on both AWS and 
> OpenStack based clouds.
> 
> 
> §  Key Points: CloudBridge, OpenStack4J are powerful libraries for resource 
> provisioning, but currently they do single-instance provisioning, and not 
> support templated boot options such as CloudFormation (for AWS) & Heat (for 
> OpenStack).
> 
> 
> o    Building a cluster:
> 
> §  We wrote Ansible script for deploying a Mesos-Marathon cluster on a set of 
> nodes. This script will install necessary dependencies such as Zookeeper.
> 
> §  We tested this on OpenStack based clouds & on EC2.
> 
> §  OpenStack Magnum provides excellent support for doing resource 
> provisioning & deploying mesos cluster, but we are running into some problems 
> while trying it.
> 
> 
> o    Installing a scheduler:
> 
> §  Our Ansible script is currently installing Marathon as the scheduler on 
> Mesos. We haven’t yet submitted jobs using Marathon.
> 
> 
> ·         Although not finalized, but we are inclined towards using Ansible 
> approach for the above, as Ansible also provides Python APIs and which will 
> allow us to integrate it with Airavata via Thrift. Hence we will be able to 
> easily invoke the Ansible scripts from code without needing to use the 
> command-line interface.
> 
> 
> ·         We are also progressively working on some work-items such as:
> 
> o    Exploring options to provision and deploy a Mesos-Marathon cluster on 
> HPC systems such as Comet. The challenge would be to use Ansible to provision 
> resources and deploy the cluster. Once we have a cluster, we can try running 
> applications.
> 
> o    Exploring different scheduler options for running serial and parallel 
> (MPI) jobs on such heterogeneous clusters.
> 
> o    Exploring orchestration options such as OpenStack Heat, AWS 
> CloudFormation, OpenStack Magnum, etc.
> 
> Any suggestions and comments are highly appreciated.
> 
> Thanks and Regards,
> Gourav Shenoy
> 
> 

Reply via email to