Hi Gourav, Please go ahead and submit a proposal draft through the GSOC 2016 web site. I personally recommend using the google doc option over posting the drafts to the Airavata wiki since I can make comments inline.
Thanks, Marlon From: Gourav Rattihalli <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Monday, March 21, 2016 at 10:22 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [GSoC Proposal] - Integrating Job and Cloud Health Information of Apache Aurora with Apache Airavata Hi Dev Team, Please review the following GSoC proposal that I plan to submit: Title: Integrating Job and Cloud Health Information of Apache Aurora with Apache Airavata Abstract: This project will incorporate Apache Aurora to enable Airavata to launch jobs on large cloud environments, and collect the related information on the health of each job and the cloud resources. The project will also analyze the current micro-services architecture of Airavata and develop code for an updated architecture for modules such as Logging. As as result, another outcome of this project would be development of a module that will collect all the logging information from the various execution points in an Airavata job's lifecycle and provide search and mining capability. Introduction: Apache Aurora is a service scheduler, that runs on top of Apache Mesos. This combination enables the use of long running services that take advantage of Apache Mesos scalability, fault-tolerance and resource isolation. Apache Mesos is a cluster manager, which provides information about the state of the cluster. Aurora uses that knowledge to make scheduling decisions. For example, when a machine experiences failure Aurora automatically reschedules those previously-running services onto a healthy machine in order to keep them running. Each job is tracked by Aurora to be in one of the following states: pending, assigned, starting, running, and finished. Apache Aurora requires a configuration file ”.aurora” to launch jobs. Following is an example of Aurora configuration file: import os hello_world_process = Process(name = 'hello_world', cmdline = 'echo hello world') hello_world_task = Task( resources = Resources(cpu = 0.1, ram = 16 * MB, disk = 16 * MB), processes = [hello_world_process]) hello_world_job = Job( cluster = 'cluster1', role = os.getenv('USER'), task = hello_world_task) jobs = [hello_world_job] To launch the job with the above configuration we use: aurora job create cluster1/$USER/test/hello_world hello_world.aurora This project will develop modules in Airavata to automatically generate the Aurora configuration file to launch a job on an Aurora-managed cluster in a cloud environment. The Aurora user interface, as shown in the web portal displayed above, provides detailed information on the job status, job name, start and finish times, location of the logs, and resource usage. This project will use add a module to Apache Aurora to pull this detailed information using the the Aurora HTTP API. Goals: * This project will investigate how apache Aurora collects information of cluster environment for display on the Aurora web interface. We will study the Aurora HTTP API and retrieve all the information related to the target infrastructure and job health, and make it available to the Airavata job submission module. * We will process the retrieved information from Aurora and convert the information in a format that can be used by Airavata for further action. * We will use the appropriate design patterns to integrate the use of Aurora as one of the options for Big Data and Cloud resource frameworks with the Airavata framework * We will make the resource information from Aurora available for display on the Airavata dashboard. Any comment and suggestions would be very helpful. -Gourav Rattihalli
