Hi Gourav, This is known issue, I have already mentioned above workaround in the project README file, see below
1. set valid aws credentials in roles/ec2/vars/aws-credential.yml if it doesn't work add following to ec2 task in roles/ec2/tasks/main.yml aws_access_key: <your_valid_access_key> aws_secret_key: <your_valid_secret_key? Regards, Shameera. On Wed, Sep 21, 2016 at 6:26 PM Shenoy, Gourav Ganesh <[email protected]> wrote: > Hi dev, > > > > I just hit another problem with the ansible script for mesos-deployment. > This issue is related to creating instances in ec2 using the ansible > playbook. The fix is mentioned later below. > > > > In particular, when you run the command (which would spin up 4 machines in > EC2): > > ansible-playbook -i hosts site.yml -t "ec2" > > > > you might see the below authentication error: > > > > TASK [ec2 : create a aws instace/s] > ******************************************** > > failed: [localhost] (item=gs-mesos-master-1) => {"failed": true, "item": > "gs-mesos-master-1", "msg": "No handler was ready to authenticate. 1 > handlers were checked. ['HmacAuthV4Handler'] Check your credentials"} > > failed: [localhost] (item=gs-mesos-master-2) => {"failed": true, "item": > "gs-mesos-master-2", "msg": "No handler was ready to authenticate. 1 > handlers were checked. ['HmacAuthV4Handler'] Check your credentials"} > > failed: [localhost] (item=gs-mesos-master-3) => {"failed": true, "item": > "gs-mesos-master-3", "msg": "No handler was ready to authenticate. 1 > handlers were checked. ['HmacAuthV4Handler'] Check your credentials"} > > failed: [localhost] (item=gs-mesos-slave-1) => {"failed": true, "item": > "gs-mesos-slave-1", "msg": "No handler was ready to authenticate. 1 > handlers were checked. ['HmacAuthV4Handler'] Check your credentials"} > > > > This is because the ansible playbook is not able to authenticate the user, > even if you have updated the “roles/ec2/vars/aws-credential.yml” file with > your AWS access & secret keys. > > > > I was able to resolve this issue by adding the following (highlighted in > yellow) to “roles/ec2/tasks/main.yml” file – which runs the task of > creating the EC2 instances. > > > > - name: create a aws instace/s > > ec2: > > aws_access_key: "{{aws_access_key}}" > > aws_secret_key: "{{aws_secret_key}}" > > key_name: "{{ key_name }}" > > region: us-east-1 > > > > Basically, this ansible task had no way of knowing the user credentials > when it tried to create the instance(s), hence the error. Hope this helps! > > > > @Shameera, > > Is this a valid fix? If yes, could you update the ansible script? Thanks > in advance. > > > > Thanks and Regards, > > Gourav Shenoy > > > > *From: *Suresh Marru <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Friday, September 16, 2016 at 11:02 PM > > > *To: *Airavata Dev <[email protected]> > *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling > > > > Hi Gourav, > > > > Thank you for this excellent communication. Hope others will follow suite > on such mailing lists updates. When you post such nontrivial diagnosis to > the mailing lists, others having trouble will be able to search on this > thread and follow these to fix. > > > > Hoping to see lot more dev list threads similar to this one. > > > > Suresh > > > > On Sep 16, 2016, at 10:16 PM, Shenoy, Gourav Ganesh <[email protected]> > wrote: > > > > Hi dev, > > > > I finally managed to get the mesos-marathon cluster up & running using the > Ansible script. There were couple of issues because of which things were > failing. I have listed the problems faced during installation & the > solutions that fixed things for me. > > > > 1. Marathon was not getting installed – This is because Marathon just > released a new build (version: 1.3.0-1.0.506.el7) 2 days back and > apparently the RPM for this version is corrupt, and hence a plain “yum > install marathon” fails. To get around, I listed all versions of marathon > present in the repository. > # yum --showduplicates list marathon | expand > marathon.x86_64 1.1.3-1.0.503.el7 > mesosphere > > marathon.x86_64 1.3.0-1.0.506.el7 > mesosphere > > The next latest version was 1.1.3-1.0.503.el7 which seemed stable, and > hence I updated the ansible task to use this version for marathon. > > In “roles/mesos-master/tasks/main.yml” I updated the following: > - name: install mesos and marathon > > yum: > > name: "{{ item }}" > > with_items: > > - mesos > > - marathon-1.1.3-1.0.503.el7 > > > The mesos-marathon cluster installation worked perfectly fine after this > change. > > > > 2. Even after this, the command “mesos-resolve `cat /etc/mesos/zk`” > was failing with the error Failed to obtain the IP address for > 'ip-172-30-1-197'; the DNS service may not be able to resolve it: Name or > service not known > > Apparently it couldn’t resolve the hostname for the local master machine. > I resolved this issue by adding a host entry in each master node. > Eg: On master node which threw above error, I added the host entry > (/etc/hosts): > *172.30.1.197 ip-172-30-1-197* > > > > After this I was able to get the master-ip and visit the mesos dashboard > (master-ip:5050) > > > > 3. I noticed that although I was able to view the mesos dashboard, > I couldn’t access the marathon dashboard. The connection to > <master-ip>:8080 was getting refused. I then restarted the marathon service > on the master node – sudo service marathon restart. After this I was able > to access the marathon dashboard as well. > > > > Hope this helps! > > > > Thanks and Regards, > > Gourav Shenoy > > > > *From: *"Shenoy, Gourav Ganesh" <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Friday, September 16, 2016 at 3:52 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling > > > > Hi Shameera, > > > > As discussed, after commenting out the “marathon” section the ansible > playbooks execute without errors. But when I try to get the master-ip using > “mesos-resolve”, I get an error: > > > > I SSH’ed into one of the master machine and tried to check the status of > the mesos-master service, seems like the service is in failed state. See > the trace below: > > > > [centos@ip-172-30-1-39 ~]$ sudo service mesos-master status > > Redirecting to /bin/systemctl status mesos-master.service > > ● mesos-master.service - Mesos Master > > Loaded: loaded (/usr/lib/systemd/system/mesos-master.service; enabled; > vendor preset: disabled) > > Active: activating (auto-restart) (Result: exit-code) since Fri > 2016-09-16 19:46:37 UTC; 18s ago > > Process: 12608 ExecStart=/usr/bin/mesos-init-wrapper master *(code=exited, > status=1/FAILURE)* > > Main PID: 12608 (code=exited, status=1/FAILURE) > > > > Sep 16 19:46:37 ip-172-30-1-39 systemd[1]: *Unit mesos-master.service > entered failed state.* > > Sep 16 19:46:37 ip-172-30-1-39 systemd[1]: *mesos-master.service failed.* > > > > Hope this helps debugging the problem. > > > > Thanks and Regards, > > Gourav Shenoy > > > > *From: *Suresh Marru <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Friday, September 16, 2016 at 9:30 AM > *To: *Airavata Dev <[email protected]> > *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling > > > > Hi Shameera, > > > > All of these are great directions for Airavata, thank you for pushing the > Ansible and Mesos deployments on the clouds. I think it will be better if > we get your scripts into Airavata repo and all of us collectively work on > it. Looks like atleast Pankaj and Gourav will also be able to contribution > in addition to you. > > > > Suresh > > > > On Sep 15, 2016, at 8:59 PM, Shenoy, Gourav Ganesh <[email protected]> > wrote: > > > > Sure, thanks Shameera. I will try this. > > > > Thanks and Regards, > > Gourav Shenoy > > > > *From: *Shameera Rathnayaka <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Thursday, September 15, 2016 at 8:55 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling > > > > Interesting, I am also getting the same issue. The same script worked > perfectly yesterday. I doubt some issue with marathon rpm. By removing > marathon installation Mesos get installed without any issue. > > > > to remove marathon installation do following to > /roles/mesos-master/tasks/main.yml > file. > > > > 1. comment marathon in "install mesos and marathon" task > > 2. comment the last task which start marathon > > > > Meanwhile, i will try to find exact reason. > > > > ~ Shameera. > > > > On Thu, Sep 15, 2016 at 8:32 PM Shenoy, Gourav Ganesh < > [email protected]> wrote: > > Hi Shameera, > > > > I am using the same image which you used (centos_ami_7_2: ami-6d1c2007). > > > > Thanks and Regards, > > Gourav Shenoy > > > > *From: *Shameera Rathnayaka <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Thursday, September 15, 2016 at 8:26 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling > > > > Hi Gourav, > > > > According to the error, something have happened while unpacking marathon > bundle, see: Installing : > marathon-1.3.0-1.0.506.el7.x86_64 1/1 \nerror: > unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio: > read\n Verifying : > marathon-1.3.0-1.0.506.el7.x86_64 1/1 > \n\nFailed:\n marathon.x86_64 0:1.3.0-1.0.506.el7 > > > > What OS image and version you used to create instances? I tested with > centos 7.2 and it works fine. > > > > ~ Shameera. > > > > > > On Thu, Sep 15, 2016 at 8:14 PM Shenoy, Gourav Ganesh < > [email protected]> wrote: > > Hi Shameera, > > > > I am trying to build a mesos cluster on EC2 using your playbooks. But I am > facing some issues. Please find the details below: > > > > *Details:* > > - I created 4 instances on EC2 (us-east-1 region) using the > cloud-provisioning module (CloudBridge python). Out of the 4, 3 were meant > to be mesos masters & 1 slave. > *Note**: The instance inbound & outbount traffic is wideopen.* > > - I skipped step-1 & step-2 in your README, since I manually > provisioned the instances. Next, I updated “hosts” file with public IPs for > all 4 instances. And also updated the “roles/zookeeper/vars/main.yml” file > with the private IPs of 3 master instances. > > - I executed the “ansible-playbook -i hosts site.yml -t > "mesos-master"” command, and I get the following error: > > > > TASK [mesos-master : install firewalld] > **************************************** > > ok: [52.91.152.1] > > ok: [52.87.235.79] > > ok: [54.167.94.186] > > > > TASK [mesos-master : start firewalld] > ****************************************** > > ok: [52.91.152.1] > > ok: [52.87.235.79] > > ok: [54.167.94.186] > > > > TASK [mesos-master : open ports] > *********************************************** > > ok: [52.91.152.1] => (item=5050/tcp) > > ok: [52.87.235.79] => (item=5050/tcp) > > ok: [54.167.94.186] => (item=5050/tcp) > > ok: [52.87.235.79] => (item=8080/tcp) > > ok: [54.167.94.186] => (item=8080/tcp) > > ok: [52.91.152.1] => (item=8080/tcp) > > > > TASK [mesos-master : install utility - TODO delete this] > *********************** > > ok: [52.91.152.1] => (item=[u'vim']) > > ok: [52.87.235.79] => (item=[u'vim']) > > ok: [54.167.94.186] => (item=[u'vim']) > > > > TASK [mesos-master : add mesosphere rpm] > *************************************** > > ok: [52.91.152.1] > > ok: [52.87.235.79] > > ok: [54.167.94.186] > > > > TASK [mesos-master : install mesos and marathon] > ******************************* > > failed: [52.91.152.1] (item=[u'mesos', u'marathon']) => {"changed": true, > "failed": true, "item": ["mesos", "marathon"], "msg": "Error unpacking rpm > package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, "results": ["All > packages providing mesos are up to date", "Loaded plugins: > fastestmirror\nLoading mirror speeds from cached hostfile\n * base: > mirrors.tripadvisor.com\n * extras: centos.hostingxtreme.com\n * updates: > mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running > transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be > installed\n--> Finished Dependency Resolution\n\nDependencies > Resolved\n\n================================================================================\n > Package Arch Version Repository > Size\n================================================================================\nInstalling:\n > marathon x86_64 1.3.0-1.0.506.el7 mesosphere > 17 M\n\nTransaction > Summary\n================================================================================\nInstall > 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading > packages:\nRunning transaction check\nRunning transaction test\nTransaction > test succeeded\nRunning transaction\n Installing : > marathon-1.3.0-1.0.506.el7.x86_64 1/1 \nerror: > unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio: > read\n Verifying : > marathon-1.3.0-1.0.506.el7.x86_64 1/1 > \n\nFailed:\n marathon.x86_64 > 0:1.3.0-1.0.506.el7 > \n\nComplete!\n"]} > > failed: [52.87.235.79] (item=[u'mesos', u'marathon']) => {"changed": true, > "failed": true, "item": ["mesos", "marathon"], "msg": "Error unpacking rpm > package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, "results": ["All > packages providing mesos are up to date", "Loaded plugins: > fastestmirror\nLoading mirror speeds from cached hostfile\n * base: > mirrors.tripadvisor.com\n * extras: mirrors.evowise.com\n * updates: > mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running > transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be > installed\n--> Finished Dependency Resolution\n\nDependencies > Resolved\n\n================================================================================\n > Package Arch Version Repository > Size\n================================================================================\nInstalling:\n > marathon x86_64 1.3.0-1.0.506.el7 mesosphere > 17 M\n\nTransaction > Summary\n================================================================================\nInstall > 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading > packages:\nRunning transaction check\nRunning transaction test\nTransaction > test succeeded\nRunning transaction\n Installing : > marathon-1.3.0-1.0.506.el7.x86_64 1/1 \nerror: > unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio: > read\n Verifying : > marathon-1.3.0-1.0.506.el7.x86_64 1/1 > \n\nFailed:\n marathon.x86_64 > 0:1.3.0-1.0.506.el7 > \n\nComplete!\n"]} > > failed: [54.167.94.186] (item=[u'mesos', u'marathon']) => {"changed": > true, "failed": true, "item": ["mesos", "marathon"], "msg": "Error > unpacking rpm package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, > "results": ["All packages providing mesos are up to date", "Loaded plugins: > fastestmirror\nLoading mirror speeds from cached hostfile\n * base: > mirrors.tripadvisor.com\n * extras: mirrors.evowise.com\n * updates: > mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running > transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be > installed\n--> Finished Dependency Resolution\n\nDependencies > Resolved\n\n================================================================================\n > Package Arch Version Repository > Size\n================================================================================\nInstalling:\n > marathon x86_64 1.3.0-1.0.506.el7 mesosphere > 17 M\n\nTransaction > Summary\n================================================================================\nInstall > 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading > packages:\nRunning transaction check\nRunning transaction test\nTransaction > test succeeded\nRunning transaction\n Installing : > marathon-1.3.0-1.0.506.el7.x86_64 1/1 \nerror: > unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio: > read\n Verifying : > marathon-1.3.0-1.0.506.el7.x86_64 1/1 > \n\nFailed:\n marathon.x86_64 > 0:1.3.0-1.0.506.el7 > \n\nComplete!\n"]} > > > > NO MORE HOSTS LEFT > ************************************************************* > > > > RUNNING HANDLER [zookeeper : restart zookeeper] > ******************************** > > *[WARNING]: Could not create retry file 'site.retry'. [Errno 2] No > such file or directory: ''* > > > > > > PLAY RECAP > ********************************************************************* > > 52.87.235.79 : ok=17 changed=2 unreachable=0 > failed=1 > > 52.91.152.1 : ok=17 changed=2 unreachable=0 > failed=1 > > 54.167.94.186 : ok=17 changed=2 unreachable=0 > failed=1 > > localhost : ok=1 changed=0 unreachable=0 failed=0 > > > > Is there some step that I am missing? It looks like the instances are not > able to communicate because of the firewall? This is just a wild guess. Any > help here is appreciated. > > > Thanks and Regards, > > Gourav Shenoy > > > > *From: *Shameera Rathnayaka <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Monday, September 12, 2016 at 11:19 AM > *To: *dev <[email protected]> > *Subject: *Spinup Mesos-Marathon Cluster for Hybrid Scheduling > > > > Hi Dev, > > > > The effort of getting use Cloud infrastructure to run MPI and BigData jobs > using Airavata, we use Apache Mesos as resource allocation framework to > manage different type of clusters (i.e HPC node cluster to run MPI jobs and > spark, Hadoop big data clusters to run bigdata applications). I came up > with Ansible script to spin up Mesos cluster on the target set of nodes. > You can find the script herehttps://github.com/shamrath/mesos-deployment I > am thinking of move this code to Airavata if all agreed. I would happy to > answer any question related to this. > > > > Thanks, > > Shameera. > > -- > > Shameera Rathnayaka > > -- > > Shameera Rathnayaka > > -- > > Shameera Rathnayaka > > > > > -- Shameera Rathnayaka
