Sure Suresh, will update my findings on the mailing list. Thanks! On Tue, Oct 18, 2016 at 7:59 AM, Suresh Marru <sma...@apache.org> wrote:
> Hi Mangirish, > > This is interesting. Looking forward to see what you will find our further > on gang scheduling support. Since the compute nodes are getting bigger, > even if you can explore single node MPI (on Jetstream using 22 cores) that > will help. > > Suresh > > P.S. Good to see the momentum on mailing list discussions on such topics. > > On Oct 18, 2016, at 1:54 AM, Mangirish Wagle <vaglomangir...@gmail.com> > wrote: > > Hello Devs, > > Here is an update on some new learnings and thoughts based on my > interactions with Mesos and Aurora devs. > > MPI implementations in Mesos repositories (like MPI Hydra) rely on > obsolete MPI platforms and no longer supported my the developer community. > Hence it is not recommended that we use this for our purpose. > > One of the known ways of running MPI jobs over mesos is using "gang > scheduling" which is basically distributing the MPI run over multiple jobs > on mesos in place of multiple nodes. The challenge here is the jobs need to > be scheduled as one task and any job errored should collectively error out > the main program including all the distributed jobs. > > One of the Mesos developer (Niklas Nielsen) pointed me out to his work on > gang scheduling: https://github.com/nqn. This code may not be fully > tested but certainly a good starting point to explore gang scheduling. > > One of the Aurora developer (Stephen Erb) suggests using gang scheduling > on top of Aurora. Aurora scheduler assumes that every job is independent. > Hence, there would be a need to develop some external scaffolding to > coordinate and schedule these jobs, which might not be trivial. One > advantage of using Aurora as a backend for gang scheduling is that we would > inherit the robustness of Aurora, which otherwise would be a key challenge > if targeting bare mesos. > > Alternative to all the options above, I think we should probably be able > to run a 1 node MPI job through Aurora. A resource offer with CPUs and > Memory from Mesos is abstracted as a single runtime, but is mapped to > multiple nodes underneath, which eventually would exploit distributed > resource capabilities. > > I intend to try out the 1 node MPI job submission approach first and > simultaneously explore the gang scheduling approach. > > Please let me know your thoughts/ suggestions. > > Best Regards, > Mangirish > > > > On Thu, Oct 13, 2016 at 12:39 PM, Mangirish Wagle < > vaglomangir...@gmail.com> wrote: > >> Hi Marlon, >> Thanks for confirming and sharing the legal link. >> >> -Mangirish >> >> On Thu, Oct 13, 2016 at 12:13 PM, Pierce, Marlon <marpi...@iu.edu> wrote: >> >>> BSD is ok: https://www.apache.org/legal/resolved. >>> >>> >>> >>> *From: *Mangirish Wagle <vaglomangir...@gmail.com> >>> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >>> *Date: *Thursday, October 13, 2016 at 12:03 PM >>> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >>> *Subject: *Re: Running MPI jobs on Mesos based clusters >>> >>> >>> >>> Hello Devs, >>> >>> I needed some advice on the license of the MPI libraries. The MPICH >>> library that I have been trying claims to have a "BSD Like" license ( >>> http://git.mpich.org/mpich.git/blob/HEAD:/COPYRIGHT). >>> >>> I am aware that OpenMPI which uses BSD license is currently used in our >>> application. I had chosen to start investigating MPICH because it claims to >>> be a highly portable and high quality implementation of latest MPI >>> standard, suitable to cloud based clusters. >>> >>> If anyone could please advise on the acceptance of the MPICH libraries >>> MSD Like license for ASF, that would help. >>> >>> Thank you. >>> >>> Best Regards, >>> >>> Mangirish Wagle >>> >>> >>> >>> On Thu, Oct 6, 2016 at 1:48 AM, Mangirish Wagle < >>> vaglomangir...@gmail.com> wrote: >>> >>> Hello Devs, >>> >>> >>> >>> The network issue mentioned above now stands resolved. The problem was >>> with the iptables had some conflicting rules which blocked the traffic. It >>> was resolved by simple iptables flush. >>> >>> >>> >>> Here is the test MPI program running on multiple machines:- >>> >>> >>> >>> [centos@mesos-slave-1 ~]$ mpiexec -f machinefile -n 2 ./mpitest >>> >>> Hello world! I am process number: 0 on host mesos-slave-1 >>> >>> Hello world! I am process number: 1 on host mesos-slave-2 >>> >>> >>> >>> The next step is to try invoking this through framework like Marathon. >>> However, the job submission still does not run through Marathon. It seems >>> to gets stuck in the 'waiting' state forever (For example >>> http://149.165.170.245:8080/ui/#/apps/%2Fmaw-try). Further, I notice >>> that Marathon is listed under 'inactive frameworks' in mesos dashboard ( >>> http://149.165.171.33:5050/#/frameworks). >>> >>> >>> >>> I am trying to get this working, though any help/ clues with this would >>> be really helpful. >>> >>> >>> >>> Thanks and Regards, >>> >>> Mangirish Wagle >>> >>> >>> >>> >>> On Fri, Sep 30, 2016 at 9:21 PM, Mangirish Wagle < >>> vaglomangir...@gmail.com> wrote: >>> >>> Hello Devs, >>> >>> >>> >>> I am currently running a sample MPI C program using 'mpiexec' provided >>> by MPICH. I followed their installation guide >>> <http://www.mpich.org/static/downloads/3.2/mpich-3.2-installguide.pdf> to >>> install the libraries on the master and slave nodes of the mesos cluster. >>> >>> >>> >>> The approach that I am trying out here is that I am equipping the >>> underlying nodes with MPI handling tools and then use the Mesos framework >>> like Marathon/ Aurora to submit jobs to run MPI programs by invoking these >>> tools. >>> >>> >>> >>> You can potentially run an MPI program using mpiexec in the following >>> manner:- >>> >>> >>> >>> # *mpiexec -f machinefile -n 2 ./mpitest* >>> >>> - *machinefile *-> File which contains an inventory of machines to >>> run the program on and number of processes on each machine. >>> - *mpitest *-> MPI program compiled in C using mpicc compiler. The >>> program returns the process number and he hostname of the machine running >>> the process. >>> - *-n *option indicates number of processes that it needs to spawn >>> >>> Example of machinefile contents:- >>> >>> >>> >>> # Entries in the format <hostname/IP>:<number of processes> >>> >>> mesos-slave-1:1 >>> >>> mesos-slave-2:1 >>> >>> >>> >>> The reason for choosing slaves is that Mesos runs the jobs on slaves, >>> managed by 'agents' pertaining to the slaves. >>> >>> >>> >>> Output of the program with '-n 1':- >>> >>> >>> >>> # mpiexec -f machinefile -n 1 ./mpitest >>> >>> Hello world! I am process number: 0 on host mesos-slave-1 >>> >>> >>> >>> But when I try for '-n 2', I am hitting the following error:- >>> >>> >>> >>> # mpiexec -f machinefile -n 2 ./mpitest >>> >>> [proxy:0:1@mesos-slave-2] HYDU_sock_connect >>> (/home/centos/mpich-3.2/src/pm/hydra/utils/sock/sock.c:172): unable to >>> connect from "mesos-slave-2" to "mesos-slave-1" (No route to host) >>> >>> [proxy:0:1@mesos-slave-2] main >>> (/home/centos/mpich-3.2/src/pm/hydra/pm/pmiserv/pmip.c:189): >>> *unable to connect to server mesos-slave-1 at port 44788* (check for >>> firewalls!) >>> >>> >>> >>> It seems to not allow the program execution due to network traffic being >>> blocked. I checked security groups in scigap openstack for mesos-slave-1, >>> mesos-slave-2 nodes and it is set to 'wideopen' policy. Furthermore, I >>> tried adding explicit rules to the policies to allow all TCP and UDP >>> (Currently I am not sure what protocol is used underneath), even then it >>> continues throwing this error. >>> >>> >>> >>> Any clues, suggestions, comments about the error or approach as a whole >>> would be helpful. >>> >>> >>> >>> Thanks and Regards, >>> >>> Mangirish Wagle >>> >>> >>> >>> *Error! Filename not specified.* >>> >>> >>> >>> On Tue, Sep 27, 2016 at 11:23 AM, Mangirish Wagle < >>> vaglomangir...@gmail.com> wrote: >>> >>> Hello Devs, >>> >>> >>> >>> Thanks Gourav and Shameera for all the work w.r.t. setting up the >>> Mesos-Marathon cluster on Jetstream. >>> >>> >>> >>> I am currently evaluating MPICH (http://www.mpich.org/about/overview/) >>> to be used for launching MPI jobs on top of mesos. MPICH version 1.2 >>> supports Mesos based MPI scheduling. I have been also trying to submit jobs >>> to the cluster through Marathon. However, in either cases I am currently >>> facing issues which I am working to get resolved. >>> >>> >>> >>> I am compiling my notes into the following google doc. You may please >>> review and let me know your comments, suggestions. >>> >>> >>> >>> https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3 >>> la25y6bcPcmrTD6nR8g/edit?usp=sharing >>> >>> >>> >>> Thanks and Regards, >>> >>> Mangirish Wagle >>> >>> >>> >>> *Error! Filename not specified.* >>> >>> >>> >>> On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh < >>> goshe...@indiana.edu> wrote: >>> >>> Hi Mangirish, >>> >>> >>> >>> I have set up a Mesos-Marathon cluster for you on Jetstream. I will >>> share with you with the cluster details in a separate email. Kindly note >>> that there are 3 masters & 2 slaves in this cluster. >>> >>> >>> >>> I am also working on automating this process for Jetstream (similar to >>> Shameera’s ansible script for EC2) and when that is ready, we can create >>> clusters or add/remove slave machines from the cluster. >>> >>> >>> >>> Thanks and Regards, >>> >>> Gourav Shenoy >>> >>> >>> >>> *From: *Mangirish Wagle <vaglomangir...@gmail.com> >>> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >>> *Date: *Wednesday, September 21, 2016 at 2:36 PM >>> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >>> *Subject: *Running MPI jobs on Mesos based clusters >>> >>> >>> >>> Hello All, >>> >>> >>> >>> I would like to post for everybody's awareness about the study that I am >>> undertaking this fall, i.e. to evaluate various different frameworks that >>> would facilitate MPI jobs on Mesos based clusters for Apache Airavata. >>> >>> >>> >>> Some of the options that I am looking at are:- >>> >>> 1. MPI support framework bundled with Mesos >>> 2. Apache Aurora >>> 3. Marathon >>> 4. Chronos >>> >>> Some of the evaluation criteria that I am planning to base my >>> investigation are:- >>> >>> - Ease of setup >>> - Documentation >>> - Reliability features like HA >>> - Scaling and Fault recovery >>> - Performance >>> - Community Support >>> >>> Gourav and Shameera are working on ansible based automation to spin up a >>> mesos based cluster and I am planning to use it to setup a cluster for >>> experimentation. >>> >>> >>> >>> Any suggestions or information about prior work on this would be highly >>> appreciated. >>> >>> >>> >>> Thank you. >>> >>> >>> >>> Best Regards, >>> >>> Mangirish Wagle >>> >>> *Error! Filename not specified.* >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> > >