Hi Devs, As there were no any objections, I imported my local mesos ansible script from https://github.com/shamrath/mesos-deployment which already under Apache 2.0 License, to Apache Airavata preserving all history.
Thanks, Shameera. On Wed, Sep 21, 2016 at 7:13 PM Shameera Rathnayaka <shameerai...@gmail.com> wrote: > Hi Gourav, > > This is known issue, I have already mentioned above workaround in the > project README file, see below > > > 1. > > set valid aws credentials in roles/ec2/vars/aws-credential.yml if it > doesn't work add following to ec2 task in roles/ec2/tasks/main.yml > > aws_access_key: <your_valid_access_key> > > aws_secret_key: <your_valid_secret_key? > > > Regards, > Shameera. > > > On Wed, Sep 21, 2016 at 6:26 PM Shenoy, Gourav Ganesh < > goshe...@indiana.edu> wrote: > >> Hi dev, >> >> >> >> I just hit another problem with the ansible script for mesos-deployment. >> This issue is related to creating instances in ec2 using the ansible >> playbook. The fix is mentioned later below. >> >> >> >> In particular, when you run the command (which would spin up 4 machines >> in EC2): >> >> ansible-playbook -i hosts site.yml -t "ec2" >> >> >> >> you might see the below authentication error: >> >> >> >> TASK [ec2 : create a aws instace/s] >> ******************************************** >> >> failed: [localhost] (item=gs-mesos-master-1) => {"failed": true, "item": >> "gs-mesos-master-1", "msg": "No handler was ready to authenticate. 1 >> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"} >> >> failed: [localhost] (item=gs-mesos-master-2) => {"failed": true, "item": >> "gs-mesos-master-2", "msg": "No handler was ready to authenticate. 1 >> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"} >> >> failed: [localhost] (item=gs-mesos-master-3) => {"failed": true, "item": >> "gs-mesos-master-3", "msg": "No handler was ready to authenticate. 1 >> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"} >> >> failed: [localhost] (item=gs-mesos-slave-1) => {"failed": true, "item": >> "gs-mesos-slave-1", "msg": "No handler was ready to authenticate. 1 >> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"} >> >> >> >> This is because the ansible playbook is not able to authenticate the >> user, even if you have updated the “roles/ec2/vars/aws-credential.yml” file >> with your AWS access & secret keys. >> >> >> >> I was able to resolve this issue by adding the following (highlighted in >> yellow) to “roles/ec2/tasks/main.yml” file – which runs the task of >> creating the EC2 instances. >> >> >> >> - name: create a aws instace/s >> >> ec2: >> >> aws_access_key: "{{aws_access_key}}" >> >> aws_secret_key: "{{aws_secret_key}}" >> >> key_name: "{{ key_name }}" >> >> region: us-east-1 >> >> >> >> Basically, this ansible task had no way of knowing the user credentials >> when it tried to create the instance(s), hence the error. Hope this helps! >> >> >> >> @Shameera, >> >> Is this a valid fix? If yes, could you update the ansible script? Thanks >> in advance. >> >> >> >> Thanks and Regards, >> >> Gourav Shenoy >> >> >> >> *From: *Suresh Marru <sma...@apache.org> >> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >> *Date: *Friday, September 16, 2016 at 11:02 PM >> >> >> *To: *Airavata Dev <dev@airavata.apache.org> >> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling >> >> >> >> Hi Gourav, >> >> >> >> Thank you for this excellent communication. Hope others will follow suite >> on such mailing lists updates. When you post such nontrivial diagnosis to >> the mailing lists, others having trouble will be able to search on this >> thread and follow these to fix. >> >> >> >> Hoping to see lot more dev list threads similar to this one. >> >> >> >> Suresh >> >> >> >> On Sep 16, 2016, at 10:16 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu> >> wrote: >> >> >> >> Hi dev, >> >> >> >> I finally managed to get the mesos-marathon cluster up & running using >> the Ansible script. There were couple of issues because of which things >> were failing. I have listed the problems faced during installation & the >> solutions that fixed things for me. >> >> >> >> 1. Marathon was not getting installed – This is because Marathon just >> released a new build (version: 1.3.0-1.0.506.el7) 2 days back and >> apparently the RPM for this version is corrupt, and hence a plain “yum >> install marathon” fails. To get around, I listed all versions of marathon >> present in the repository. >> # yum --showduplicates list marathon | expand >> marathon.x86_64 1.1.3-1.0.503.el7 >> mesosphere >> >> marathon.x86_64 1.3.0-1.0.506.el7 >> mesosphere >> >> The next latest version was 1.1.3-1.0.503.el7 which seemed stable, and >> hence I updated the ansible task to use this version for marathon. >> >> In “roles/mesos-master/tasks/main.yml” I updated the following: >> - name: install mesos and marathon >> >> yum: >> >> name: "{{ item }}" >> >> with_items: >> >> - mesos >> >> - marathon-1.1.3-1.0.503.el7 >> >> >> The mesos-marathon cluster installation worked perfectly fine after this >> change. >> >> >> >> 2. Even after this, the command “mesos-resolve `cat >> /etc/mesos/zk`” was failing with the error Failed to obtain the IP >> address for 'ip-172-30-1-197'; the DNS service may not be able to resolve >> it: Name or service not known >> >> Apparently it couldn’t resolve the hostname for the local master machine. >> I resolved this issue by adding a host entry in each master node. >> Eg: On master node which threw above error, I added the host entry >> (/etc/hosts): >> *172.30.1.197 ip-172-30-1-197* >> >> >> >> After this I was able to get the master-ip and visit the mesos dashboard >> (master-ip:5050) >> >> >> >> 3. I noticed that although I was able to view the mesos dashboard, >> I couldn’t access the marathon dashboard. The connection to >> <master-ip>:8080 was getting refused. I then restarted the marathon service >> on the master node – sudo service marathon restart. After this I was able >> to access the marathon dashboard as well. >> >> >> >> Hope this helps! >> >> >> >> Thanks and Regards, >> >> Gourav Shenoy >> >> >> >> *From: *"Shenoy, Gourav Ganesh" <goshe...@indiana.edu> >> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >> *Date: *Friday, September 16, 2016 at 3:52 PM >> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling >> >> >> >> Hi Shameera, >> >> >> >> As discussed, after commenting out the “marathon” section the ansible >> playbooks execute without errors. But when I try to get the master-ip using >> “mesos-resolve”, I get an error: >> >> >> >> I SSH’ed into one of the master machine and tried to check the status of >> the mesos-master service, seems like the service is in failed state. See >> the trace below: >> >> >> >> [centos@ip-172-30-1-39 ~]$ sudo service mesos-master status >> >> Redirecting to /bin/systemctl status mesos-master.service >> >> ● mesos-master.service - Mesos Master >> >> Loaded: loaded (/usr/lib/systemd/system/mesos-master.service; enabled; >> vendor preset: disabled) >> >> Active: activating (auto-restart) (Result: exit-code) since Fri >> 2016-09-16 19:46:37 UTC; 18s ago >> >> Process: 12608 ExecStart=/usr/bin/mesos-init-wrapper master *(code=exited, >> status=1/FAILURE)* >> >> Main PID: 12608 (code=exited, status=1/FAILURE) >> >> >> >> Sep 16 19:46:37 ip-172-30-1-39 systemd[1]: *Unit mesos-master.service >> entered failed state.* >> >> Sep 16 19:46:37 ip-172-30-1-39 systemd[1]: *mesos-master.service failed.* >> >> >> >> Hope this helps debugging the problem. >> >> >> >> Thanks and Regards, >> >> Gourav Shenoy >> >> >> >> *From: *Suresh Marru <sma...@apache.org> >> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >> *Date: *Friday, September 16, 2016 at 9:30 AM >> *To: *Airavata Dev <dev@airavata.apache.org> >> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling >> >> >> >> Hi Shameera, >> >> >> >> All of these are great directions for Airavata, thank you for pushing the >> Ansible and Mesos deployments on the clouds. I think it will be better if >> we get your scripts into Airavata repo and all of us collectively work on >> it. Looks like atleast Pankaj and Gourav will also be able to contribution >> in addition to you. >> >> >> >> Suresh >> >> >> >> On Sep 15, 2016, at 8:59 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu> >> wrote: >> >> >> >> Sure, thanks Shameera. I will try this. >> >> >> >> Thanks and Regards, >> >> Gourav Shenoy >> >> >> >> *From: *Shameera Rathnayaka <shameerai...@gmail.com> >> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >> *Date: *Thursday, September 15, 2016 at 8:55 PM >> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling >> >> >> >> Interesting, I am also getting the same issue. The same script worked >> perfectly yesterday. I doubt some issue with marathon rpm. By removing >> marathon installation Mesos get installed without any issue. >> >> >> >> to remove marathon installation do following to >> /roles/mesos-master/tasks/main.yml >> file. >> >> >> >> 1. comment marathon in "install mesos and marathon" task >> >> 2. comment the last task which start marathon >> >> >> >> Meanwhile, i will try to find exact reason. >> >> >> >> ~ Shameera. >> >> >> >> On Thu, Sep 15, 2016 at 8:32 PM Shenoy, Gourav Ganesh < >> goshe...@indiana.edu> wrote: >> >> Hi Shameera, >> >> >> >> I am using the same image which you used (centos_ami_7_2: ami-6d1c2007). >> >> >> >> Thanks and Regards, >> >> Gourav Shenoy >> >> >> >> *From: *Shameera Rathnayaka <shameerai...@gmail.com> >> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >> *Date: *Thursday, September 15, 2016 at 8:26 PM >> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling >> >> >> >> Hi Gourav, >> >> >> >> According to the error, something have happened while unpacking marathon >> bundle, see: Installing : >> marathon-1.3.0-1.0.506.el7.x86_64 1/1 \nerror: >> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio: >> read\n Verifying : >> marathon-1.3.0-1.0.506.el7.x86_64 1/1 >> \n\nFailed:\n marathon.x86_64 0:1.3.0-1.0.506.el7 >> >> >> >> What OS image and version you used to create instances? I tested with >> centos 7.2 and it works fine. >> >> >> >> ~ Shameera. >> >> >> >> >> >> On Thu, Sep 15, 2016 at 8:14 PM Shenoy, Gourav Ganesh < >> goshe...@indiana.edu> wrote: >> >> Hi Shameera, >> >> >> >> I am trying to build a mesos cluster on EC2 using your playbooks. But I >> am facing some issues. Please find the details below: >> >> >> >> *Details:* >> >> - I created 4 instances on EC2 (us-east-1 region) using the >> cloud-provisioning module (CloudBridge python). Out of the 4, 3 were meant >> to be mesos masters & 1 slave. >> *Note**: The instance inbound & outbount traffic is wideopen.* >> >> - I skipped step-1 & step-2 in your README, since I manually >> provisioned the instances. Next, I updated “hosts” file with public IPs for >> all 4 instances. And also updated the “roles/zookeeper/vars/main.yml” file >> with the private IPs of 3 master instances. >> >> - I executed the “ansible-playbook -i hosts site.yml -t >> "mesos-master"” command, and I get the following error: >> >> >> >> TASK [mesos-master : install firewalld] >> **************************************** >> >> ok: [52.91.152.1] >> >> ok: [52.87.235.79] >> >> ok: [54.167.94.186] >> >> >> >> TASK [mesos-master : start firewalld] >> ****************************************** >> >> ok: [52.91.152.1] >> >> ok: [52.87.235.79] >> >> ok: [54.167.94.186] >> >> >> >> TASK [mesos-master : open ports] >> *********************************************** >> >> ok: [52.91.152.1] => (item=5050/tcp) >> >> ok: [52.87.235.79] => (item=5050/tcp) >> >> ok: [54.167.94.186] => (item=5050/tcp) >> >> ok: [52.87.235.79] => (item=8080/tcp) >> >> ok: [54.167.94.186] => (item=8080/tcp) >> >> ok: [52.91.152.1] => (item=8080/tcp) >> >> >> >> TASK [mesos-master : install utility - TODO delete this] >> *********************** >> >> ok: [52.91.152.1] => (item=[u'vim']) >> >> ok: [52.87.235.79] => (item=[u'vim']) >> >> ok: [54.167.94.186] => (item=[u'vim']) >> >> >> >> TASK [mesos-master : add mesosphere rpm] >> *************************************** >> >> ok: [52.91.152.1] >> >> ok: [52.87.235.79] >> >> ok: [54.167.94.186] >> >> >> >> TASK [mesos-master : install mesos and marathon] >> ******************************* >> >> failed: [52.91.152.1] (item=[u'mesos', u'marathon']) => {"changed": true, >> "failed": true, "item": ["mesos", "marathon"], "msg": "Error unpacking rpm >> package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, "results": ["All >> packages providing mesos are up to date", "Loaded plugins: >> fastestmirror\nLoading mirror speeds from cached hostfile\n * base: >> mirrors.tripadvisor.com\n * extras: centos.hostingxtreme.com\n * updates: >> mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running >> transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be >> installed\n--> Finished Dependency Resolution\n\nDependencies >> Resolved\n\n================================================================================\n >> Package Arch Version Repository >> Size\n================================================================================\nInstalling:\n >> marathon x86_64 1.3.0-1.0.506.el7 mesosphere >> 17 M\n\nTransaction >> Summary\n================================================================================\nInstall >> 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading >> packages:\nRunning transaction check\nRunning transaction test\nTransaction >> test succeeded\nRunning transaction\n Installing : >> marathon-1.3.0-1.0.506.el7.x86_64 1/1 \nerror: >> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio: >> read\n Verifying : >> marathon-1.3.0-1.0.506.el7.x86_64 1/1 >> \n\nFailed:\n marathon.x86_64 >> 0:1.3.0-1.0.506.el7 >> \n\nComplete!\n"]} >> >> failed: [52.87.235.79] (item=[u'mesos', u'marathon']) => {"changed": >> true, "failed": true, "item": ["mesos", "marathon"], "msg": "Error >> unpacking rpm package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, >> "results": ["All packages providing mesos are up to date", "Loaded plugins: >> fastestmirror\nLoading mirror speeds from cached hostfile\n * base: >> mirrors.tripadvisor.com\n * extras: mirrors.evowise.com\n * updates: >> mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running >> transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be >> installed\n--> Finished Dependency Resolution\n\nDependencies >> Resolved\n\n================================================================================\n >> Package Arch Version Repository >> Size\n================================================================================\nInstalling:\n >> marathon x86_64 1.3.0-1.0.506.el7 mesosphere >> 17 M\n\nTransaction >> Summary\n================================================================================\nInstall >> 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading >> packages:\nRunning transaction check\nRunning transaction test\nTransaction >> test succeeded\nRunning transaction\n Installing : >> marathon-1.3.0-1.0.506.el7.x86_64 1/1 \nerror: >> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio: >> read\n Verifying : >> marathon-1.3.0-1.0.506.el7.x86_64 1/1 >> \n\nFailed:\n marathon.x86_64 >> 0:1.3.0-1.0.506.el7 >> \n\nComplete!\n"]} >> >> failed: [54.167.94.186] (item=[u'mesos', u'marathon']) => {"changed": >> true, "failed": true, "item": ["mesos", "marathon"], "msg": "Error >> unpacking rpm package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, >> "results": ["All packages providing mesos are up to date", "Loaded plugins: >> fastestmirror\nLoading mirror speeds from cached hostfile\n * base: >> mirrors.tripadvisor.com\n * extras: mirrors.evowise.com\n * updates: >> mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running >> transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be >> installed\n--> Finished Dependency Resolution\n\nDependencies >> Resolved\n\n================================================================================\n >> Package Arch Version Repository >> Size\n================================================================================\nInstalling:\n >> marathon x86_64 1.3.0-1.0.506.el7 mesosphere >> 17 M\n\nTransaction >> Summary\n================================================================================\nInstall >> 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading >> packages:\nRunning transaction check\nRunning transaction test\nTransaction >> test succeeded\nRunning transaction\n Installing : >> marathon-1.3.0-1.0.506.el7.x86_64 1/1 \nerror: >> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio: >> read\n Verifying : >> marathon-1.3.0-1.0.506.el7.x86_64 1/1 >> \n\nFailed:\n marathon.x86_64 >> 0:1.3.0-1.0.506.el7 >> \n\nComplete!\n"]} >> >> >> >> NO MORE HOSTS LEFT >> ************************************************************* >> >> >> >> RUNNING HANDLER [zookeeper : restart zookeeper] >> ******************************** >> >> *[WARNING]: Could not create retry file 'site.retry'. [Errno 2] >> No such file or directory: ''* >> >> >> >> >> >> PLAY RECAP >> ********************************************************************* >> >> 52.87.235.79 : ok=17 changed=2 unreachable=0 >> failed=1 >> >> 52.91.152.1 : ok=17 changed=2 unreachable=0 >> failed=1 >> >> 54.167.94.186 : ok=17 changed=2 unreachable=0 >> failed=1 >> >> localhost : ok=1 changed=0 unreachable=0 >> failed=0 >> >> >> >> Is there some step that I am missing? It looks like the instances are not >> able to communicate because of the firewall? This is just a wild guess. Any >> help here is appreciated. >> >> >> Thanks and Regards, >> >> Gourav Shenoy >> >> >> >> *From: *Shameera Rathnayaka <shameerai...@gmail.com> >> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> >> *Date: *Monday, September 12, 2016 at 11:19 AM >> *To: *dev <dev@airavata.apache.org> >> *Subject: *Spinup Mesos-Marathon Cluster for Hybrid Scheduling >> >> >> >> Hi Dev, >> >> >> >> The effort of getting use Cloud infrastructure to run MPI and BigData >> jobs using Airavata, we use Apache Mesos as resource allocation framework >> to manage different type of clusters (i.e HPC node cluster to run MPI jobs >> and spark, Hadoop big data clusters to run bigdata applications). I came up >> with Ansible script to spin up Mesos cluster on the target set of nodes. >> You can find the script herehttps://github.com/shamrath/mesos-deployment I >> am thinking of move this code to Airavata if all agreed. I would happy to >> answer any question related to this. >> >> >> >> Thanks, >> >> Shameera. >> >> -- >> >> Shameera Rathnayaka >> >> -- >> >> Shameera Rathnayaka >> >> -- >> >> Shameera Rathnayaka >> >> >> >> >> > -- > Shameera Rathnayaka > -- Shameera Rathnayaka