I haven't gotten to do that unfortunately. It's on my to-do list for my own client.
Either way, I think you might get better info if you ask on one of the Aurora mailing lists. -Renan On Thu, Oct 27, 2016 at 5:36 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu > wrote: > *@Renan*, > > > > I had a question – what is the default thrift port for aurora scheduler, > which uses TBinaryProtocol? > > > > I have installed Aurora-0.16 scheduler/executor on the Mesos-1.0.1 > cluster, and only been able to use the THttpClient over TJSONProtocol (port > 8081). Aurora site mentions that they have enabled TBinaryProtocol for 0.16 > version, but somehow I am not able to find the binary port. It would be > great if you could provide some guidance here. > > > > Thanks and Regards, > > Gourav Shenoy > > > > *From: *Renan DelValle <rdelv...@binghamton.edu> > *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> > *Date: *Thursday, October 27, 2016 at 4:31 PM > *To: *Suresh Marru <sma...@apache.org> > *Cc: *Airavata Dev <dev@airavata.apache.org>, Madhusudhan Govindaraju < > mgovi...@binghamton.edu> > > *Subject: *Re: Mesos based meta-scheduling for Airavata > > > > I wish I had the bandwidth to help with this. I'll do my best to answer > any pointed questions (if there are any) on the Aurora irc/slack chat. > > -Renan > > > > On Oct 17, 2016 11:38 PM, "Suresh Marru" <sma...@apache.org> wrote: > > Hi Renan, > > > > Since you did a similar exercise using Go [1], it will be nice to see your > feedback and guidance on the discussions Gourav is summarizing below. > > > > Suresh > > > > [1] - http://markmail.org/thread/ymj7yqvvbhrjwv3s > > > > On Oct 17, 2016, at 11:32 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu> > wrote: > > > > Hi dev, > > > > Now that I have been able to get jobs scheduled via Aurora, I thought I > should summarize my understanding. I would also like to briefly draw out > the plan which I am working on with respect to using Mesos with Airavata. > > > > *Apache Aurora:* > > > > · Aurora, similar to Marathon & Chronos, is a service scheduler > framework for Mesos. It has been built for scheduling long running services > & cron jobs on Mesos. > > · The advantage with Aurora (over Marathon & Chronos) is that it > works well for one-off jobs as well – i.e. If I want to run a job and get > the output, Aurora is a better fit than Marathon & Chronos, since Marathon > will never let the job exit (and keep restarting it on slaves) & Chronos is > ONLY for crons. > > · Aurora also allows fine grained control of the jobs that need > to be submitted – the concept of jobs, tasks, processes – a job can consist > of one or more tasks, and a task can consist of one or more processes. > > · Aurora manages jobs that are made up of tasks; Mesos manages > the tasks that consist of processes; Thermos (is the Aurora executor) > manages the processes. > > · We can control resource utilization at task level because of > the above job abstractions that Aurora provides. > > · Among many other features, a useful one is the resource-quota > management for users & the ability to support multiple users to run jobs. > > > > *Current focus:* > > > > · I am currently working on building a Thrift based client for > Aurora, and have been successful in implementing one, but with limited > operations. > > · I will be adding support for more operations keeping them > aligned to Airavata job submission/monitoring requirements. > > · I am currently focusing on targeting Airavata deployment to > Mesos on a single cluster (eg: AWS). The flow would look like follows: > > <image001.png> > > · As you can see, currently there is just a single Mesos cluster. > The future focus would be to expand this to have multiple clusters. > > > > *Subsequent work:* > > · Once we are able to test Airavata deployment to single cluster > successfully, we can expand this to a multi-cluster environment. > > · Here we would multiple Mesos clusters which would somehow need > to be managed. But, the overall flow would look like follows: > > <image002.png> > > > > · We can either have multiple Mesos masters (for each individual > cluster), that are connected to each other via VPN, or have a single master > – in which case we would need to consider all other nodes as slaves. > > · This is a design issue which needs discussion, and Suresh has > some ideas on how to do this. > > > > Thanks and Regards, > > Gourav Shenoy > > > > *From: *Suresh Marru <sma...@apache.org> > *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> > *Date: *Friday, October 7, 2016 at 11:43 PM > *To: *Airavata Dev <dev@airavata.apache.org> > *Subject: *Re: Mesos based meta-scheduling for Airavata > > > > Hi Gourav, > > > > Thank you for the nice informative summaries, posts like these are always > educational. Keep’em coming. > > > > Suresh > > > > On Oct 7, 2016, at 10:56 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu> > wrote: > > > > Hi dev, > > > > I have been exploring different frameworks for Mesos which would help our > use-case of providing Airavata the capability to run jobs in a Mesos based > ecosystem. In particular, I have been playing around with Marathon & > Chronos and I am now going to be working on Apache Aurora. > > > > I have summarized my understanding about Mesos, Marathon & Chronos below. > I will send out a separate email about Aurora later. > > > > *Apache Mesos:* > > > > · Apache Mesos is an open-source cluster manager, in the sense > that it helps deploy & manage different frameworks (or applications) in a > large clustered environment easily. > > · Mesos provides the ability to utilize underlying shared pool of > nodes as a single compute unit – That is, it can run many applications on > these nodes efficiently. > > · Mesos uses the concept of “offers” for scheduling and running > jobs on the underlying nodes. When a framework (application) wants to run > computations/jobs on the cluster, Mesos will decide how many resources it > will “offer” that framework based on the availability. The framework will > then decide which resources to use from the offer, and subsequently run the > computation/job on that resource. > > · In a typical cluster, you will have 3 or more Mesos masters & > multiple Mesos slaves. Multiple mesos masters help in providing high > availability – if one master goes down, Mesos will reelect a new leader > (master) – using Zookeeper. > > · The task mentioned above of providing “offers” to frameworks is > done by a master, whereas the slaves are the ones who run these > computations. > > > > · Some additional points: > > o I built a Mesos cluster with 3 masters & 2 slaves on EC2. > > o Each master & slave have 1GB of RAM & 1vCPU with 20GB of disk space. > > > > *Marathon:* > > > > · Marathon is considered a framework that runs on top of Mesos. > It is a container orchestration platform for Mesos and essentially acts as > a service scheduler. > > · It is named “marathon” because it is intended for long running > applications. That is, Marathon makes sure that the service it is running > never stops – if a service goes down or the slave on which the service is > run dies, marathon keeps re-starting it on different slaves. > > · In some sense Marathon is very good for ensuring high > availability of services. That is, instead of running services directly on > Mesos, run it in Marathon if you never want it to die. > *Note*: You can decide to run a service on multiple slave nodes and if > resources on these slaves are available, Mesos will “offer” them to > Marathon. > > · It is called a container orchestration platform because it > “launches” these services inside a container – either Docker OR Mesos > container. > > · In my opinion it is not a suitable “job scheduler” for Airavata > because in Airavata we need to run a job and get the output rather than > keeping it running always. Instead, we can run other schedulers – > chronos/aurora as a service in Marathon. > > *Chronos:* > > > > · Chronos is a Cron scheduler for Mesos. It is good for running > scheduled jobs – jobs that need to be run for a certain number of times, > repeatedly after certain intervals. > > · Chronos also provides the ability to add dependencies between > jobs – That is, if a job1 is dependent on another job2 then it will run > job1 first and then run job2 after job1 completes. It also builds a > Directed Acyclic Graph (DAG) based on these dependencies. > > · Similar to Marathon, Chronos receives “offers” from Mesos > master whenever it needs to run a job on Mesos. > > · Again, I found that Chronos does not fit the Airavata use-case > since I could not find a way to run one-off jobs via Chronos – you need to > specify interval time for Chronos, & Chronos then re-runs the job after > that interval is complete (even if you decide to specify num. of > repetitions=1). > > > > > > Some additional points: > > · Marathon & Chronos both have REST API support – eg: you can > submit jobs via APIs along with other interactions such as list jobs, etc. > > · I installed Marathon & Chronos frameworks on the Mesos master > nodes. This is how their health looks like on the Mesos dashboard: > > > > <image002.png> > > As you can see, there are 3 active tasks running in > Chronos & 4 active tasks (long running) in Marathon. > > > > · I also installed Chronos as a service inside Marathon, and this > is how it looks like in the Marathon UI: > > <image004.png> > > Interestingly, Chronos (as a service in Marathon) was smart enough to > identify the jobs submitted via Chronos (as a framework on Mesos) & > vice-versa. > > > > · Also, Mesos dashboard lists the active tasks it is running & > details about which slave the task is running on. It also lists Completed > tasks. The “Sandbox” gives you access to the stdout/stderr files for the > tasks as well as any other directories that were created as part of the > task. > > <image005.png> > > > > Pardon me for this long email. Next, I will explore Apache Aurora which > seems a better fit for Airavata use-case because it provides the features > that Chronos supports, as well as can run one-off jobs. > > > > Thanks and Regards, > > Gourav Shenoy > > > > *From: *"Shenoy, Gourav Ganesh" <goshe...@indiana.edu> > *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> > *Date: *Friday, September 23, 2016 at 4:43 PM > *To: *"dev@airavata.apache.org" <dev@airavata.apache.org> > *Subject: *Mesos based meta-scheduling for Airavata > > > > Hi Dev, > > > > I am working on this project of building a Mesos based meta-scheduler for > Airavata, along with Shameera & Mangirish. Here is the jira link: > https://issues.apache.org/jira/browse/AIRAVATA-2082. > > > > · We have identified some tasks that would be needed for > achieving this, and at the higher level it would consist of: > > 1. Resource provisioning – We need to provision resources on cloud & > hpc infrastructures such as EC2, Jetstream, Comet, etc. > > 2. Building a cluster – Deploying a Mesos cluster on set of nodes > obtained from (1) above for task management. > > 3. Selecting a scheduler – We need to investigate the scheduler to > use with Mesos cluster. Some of the options are Marathon, Aurora. But we > need to find one that suits our needs of running serial as well as parallel > (MPI) jobs. > > 4. Installing & running applications on this cluster – Once the > cluster has been deployed and a scheduler choice made, we need to be able > to install and run applications on this cluster using Airavata. > > > > · Until now we were able to look into the following: > > o Resource provisioning: > > § We explored several options of provisioning resources – using cloud > libraries as well as via ansible scripts. > > § We built a OpenStack4J Java module which would provision instances on > OpenStack based clouds (eg: Jetstream). > > § We also built a CloudBridge Python module for provisioning EC2 > instances on Amazon. CloudBridge can also be used to provision instances on > OpenStack > > § We wrote Ansible scripts for bringing up instances on both AWS and > OpenStack based clouds. > > > > § *Key Points*: CloudBridge, OpenStack4J are powerful libraries for > resource provisioning, but currently they do single-instance provisioning, > and not support templated boot options such as CloudFormation (for AWS) & > Heat (for OpenStack). > > > > o Building a cluster: > > § We wrote Ansible script for deploying a Mesos-Marathon cluster on a > set of nodes. This script will install necessary dependencies such as > Zookeeper. > > § We tested this on OpenStack based clouds & on EC2. > > § OpenStack Magnum provides excellent support for doing resource > provisioning & deploying mesos cluster, but we are running into some > problems while trying it. > > > > o Installing a scheduler: > > § Our Ansible script is currently installing Marathon as the scheduler > on Mesos. We haven’t yet submitted jobs using Marathon. > > > > · Although not finalized, but we are inclined towards using > Ansible approach for the above, as Ansible also provides Python APIs and > which will allow us to integrate it with Airavata via Thrift. Hence we will > be able to easily invoke the Ansible scripts from code without needing to > use the command-line interface. > > > > · We are also progressively working on some work-items such as: > > o Exploring options to provision and deploy a Mesos-Marathon cluster on > HPC systems such as Comet. The challenge would be to use Ansible to > provision resources and deploy the cluster. Once we have a cluster, we can > try running applications. > > o Exploring different scheduler options for running serial and parallel > (MPI) jobs on such heterogeneous clusters. > > o Exploring orchestration options such as OpenStack Heat, AWS > CloudFormation, OpenStack Magnum, etc. > > > > Any suggestions and comments are highly appreciated. > > > > Thanks and Regards, > > Gourav Shenoy > > > > > >