Hi Gourav,

This is known issue, I have already mentioned above workaround in the
project README file, see below


   1.

   set valid aws credentials in roles/ec2/vars/aws-credential.yml if it
   doesn't work add following to ec2 task in roles/ec2/tasks/main.yml

   aws_access_key: <your_valid_access_key>

   aws_secret_key: <your_valid_secret_key?


Regards,
Shameera.


On Wed, Sep 21, 2016 at 6:26 PM Shenoy, Gourav Ganesh <goshe...@indiana.edu>
wrote:

> Hi dev,
>
>
>
> I just hit another problem with the ansible script for mesos-deployment.
> This issue is related to creating instances in ec2 using the ansible
> playbook. The fix is mentioned later below.
>
>
>
> In particular, when you run the command (which would spin up 4 machines in
> EC2):
>
> ansible-playbook -i hosts site.yml -t "ec2"
>
>
>
> you might see the below authentication error:
>
>
>
> TASK [ec2 : create a aws instace/s]
> ********************************************
>
> failed: [localhost] (item=gs-mesos-master-1) => {"failed": true, "item":
> "gs-mesos-master-1", "msg": "No handler was ready to authenticate. 1
> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"}
>
> failed: [localhost] (item=gs-mesos-master-2) => {"failed": true, "item":
> "gs-mesos-master-2", "msg": "No handler was ready to authenticate. 1
> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"}
>
> failed: [localhost] (item=gs-mesos-master-3) => {"failed": true, "item":
> "gs-mesos-master-3", "msg": "No handler was ready to authenticate. 1
> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"}
>
> failed: [localhost] (item=gs-mesos-slave-1) => {"failed": true, "item":
> "gs-mesos-slave-1", "msg": "No handler was ready to authenticate. 1
> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"}
>
>
>
> This is because the ansible playbook is not able to authenticate the user,
> even if you have updated the “roles/ec2/vars/aws-credential.yml” file with
> your AWS access & secret keys.
>
>
>
> I was able to resolve this issue by adding the following (highlighted in
> yellow) to “roles/ec2/tasks/main.yml” file – which runs the task of
> creating the EC2 instances.
>
>
>
> - name: create a aws instace/s
>
>   ec2:
>
>     aws_access_key: "{{aws_access_key}}"
>
>     aws_secret_key: "{{aws_secret_key}}"
>
>     key_name: "{{ key_name }}"
>
>     region: us-east-1
>
>
>
> Basically, this ansible task had no way of knowing the user credentials
> when it tried to create the instance(s), hence the error. Hope this helps!
>
>
>
> @Shameera,
>
> Is this a valid fix? If yes, could you update the ansible script? Thanks
> in advance.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Suresh Marru <sma...@apache.org>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Friday, September 16, 2016 at 11:02 PM
>
>
> *To: *Airavata Dev <dev@airavata.apache.org>
> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>
>
>
> Hi Gourav,
>
>
>
> Thank you for this excellent communication. Hope others will follow suite
> on such mailing lists updates. When you post such nontrivial diagnosis to
> the mailing lists, others having trouble will be able to search on this
> thread and follow these to fix.
>
>
>
> Hoping to see lot more dev list threads similar to this one.
>
>
>
> Suresh
>
>
>
> On Sep 16, 2016, at 10:16 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu>
> wrote:
>
>
>
> Hi dev,
>
>
>
> I finally managed to get the mesos-marathon cluster up & running using the
> Ansible script. There were couple of issues because of which things were
> failing. I have listed the problems faced during installation & the
> solutions that fixed things for me.
>
>
>
> 1.  Marathon was not getting installed – This is because Marathon just
> released a new build (version: 1.3.0-1.0.506.el7) 2 days back and
> apparently the RPM for this version is corrupt, and hence a plain “yum
> install marathon” fails. To get around, I listed all versions of marathon
> present in the repository.
> # yum --showduplicates list marathon | expand
> marathon.x86_64                 1.1.3-1.0.503.el7
> mesosphere
>
> marathon.x86_64                 1.3.0-1.0.506.el7
> mesosphere
>
> The next latest version was 1.1.3-1.0.503.el7 which seemed stable, and
> hence I updated the ansible task to use this version for marathon.
>
> In “roles/mesos-master/tasks/main.yml” I updated the following:
> - name: install mesos and marathon
>
>   yum:
>
>     name: "{{ item }}"
>
>   with_items:
>
>     - mesos
>
>     - marathon-1.1.3-1.0.503.el7
>
>
> The mesos-marathon cluster installation worked perfectly fine after this
> change.
>
>
>
> 2.       Even after this, the command “mesos-resolve `cat /etc/mesos/zk`”
> was failing with the error Failed to obtain the IP address for
> 'ip-172-30-1-197'; the DNS service may not be able to resolve it: Name or
> service not known
>
> Apparently it couldn’t resolve the hostname for the local master machine.
> I resolved this issue by adding a host entry in each master node.
> Eg: On master node which threw above error, I added the host entry
> (/etc/hosts):
> *172.30.1.197       ip-172-30-1-197*
>
>
>
> After this I was able to get the master-ip and visit the mesos dashboard
> (master-ip:5050)
>
>
>
> 3.       I noticed that although I was able to view the mesos dashboard,
> I couldn’t access the marathon dashboard. The connection to
> <master-ip>:8080 was getting refused. I then restarted the marathon service
> on the master node – sudo service marathon restart. After this I was able
> to access the marathon dashboard as well.
>
>
>
> Hope this helps!
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *"Shenoy, Gourav Ganesh" <goshe...@indiana.edu>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Friday, September 16, 2016 at 3:52 PM
> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>
>
>
> Hi Shameera,
>
>
>
> As discussed, after commenting out the “marathon” section the ansible
> playbooks execute without errors. But when I try to get the master-ip using
> “mesos-resolve”, I get an error:
>
>
>
> I SSH’ed into one of the master machine and tried to check the status of
> the mesos-master service, seems like the service is in failed state. See
> the trace below:
>
>
>
> [centos@ip-172-30-1-39 ~]$ sudo service mesos-master status
>
> Redirecting to /bin/systemctl status  mesos-master.service
>
> ● mesos-master.service - Mesos Master
>
>    Loaded: loaded (/usr/lib/systemd/system/mesos-master.service; enabled;
> vendor preset: disabled)
>
>    Active: activating (auto-restart) (Result: exit-code) since Fri
> 2016-09-16 19:46:37 UTC; 18s ago
>
>   Process: 12608 ExecStart=/usr/bin/mesos-init-wrapper master *(code=exited,
> status=1/FAILURE)*
>
> Main PID: 12608 (code=exited, status=1/FAILURE)
>
>
>
> Sep 16 19:46:37 ip-172-30-1-39 systemd[1]: *Unit mesos-master.service
> entered failed state.*
>
> Sep 16 19:46:37 ip-172-30-1-39 systemd[1]: *mesos-master.service failed.*
>
>
>
> Hope this helps debugging the problem.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Suresh Marru <sma...@apache.org>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Friday, September 16, 2016 at 9:30 AM
> *To: *Airavata Dev <dev@airavata.apache.org>
> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>
>
>
> Hi Shameera,
>
>
>
> All of these are great directions for Airavata, thank you for pushing the
> Ansible and Mesos deployments on the clouds. I think it will be better if
> we get your scripts into Airavata repo and all of us collectively work on
> it. Looks like atleast Pankaj and Gourav will also be able to contribution
> in addition to you.
>
>
>
> Suresh
>
>
>
> On Sep 15, 2016, at 8:59 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu>
> wrote:
>
>
>
> Sure, thanks Shameera. I will try this.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Shameera Rathnayaka <shameerai...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Thursday, September 15, 2016 at 8:55 PM
> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>
>
>
> Interesting, I am also getting the same issue. The same script worked
> perfectly yesterday. I doubt some issue with marathon rpm. By removing
> marathon installation Mesos get installed without any issue.
>
>
>
> to remove marathon installation do following to 
> /roles/mesos-master/tasks/main.yml
> file.
>
>
>
> 1. comment marathon in "install mesos and marathon" task
>
> 2. comment the last task which start marathon
>
>
>
> Meanwhile, i will try to find exact reason.
>
>
>
> ~ Shameera.
>
>
>
> On Thu, Sep 15, 2016 at 8:32 PM Shenoy, Gourav Ganesh <
> goshe...@indiana.edu> wrote:
>
> Hi Shameera,
>
>
>
> I am using the same image which you used (centos_ami_7_2: ami-6d1c2007).
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Shameera Rathnayaka <shameerai...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Thursday, September 15, 2016 at 8:26 PM
> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>
>
>
> Hi Gourav,
>
>
>
> According to the error, something have happened while unpacking marathon
> bundle, see:  Installing :
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 \nerror:
> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio:
> read\n  Verifying  :
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1
> \n\nFailed:\n  marathon.x86_64 0:1.3.0-1.0.506.el7
>
>
>
> What OS image and version you used to create instances? I tested with
> centos 7.2 and it works fine.
>
>
>
> ~ Shameera.
>
>
>
>
>
> On Thu, Sep 15, 2016 at 8:14 PM Shenoy, Gourav Ganesh <
> goshe...@indiana.edu> wrote:
>
> Hi Shameera,
>
>
>
> I am trying to build a mesos cluster on EC2 using your playbooks. But I am
> facing some issues. Please find the details below:
>
>
>
> *Details:*
>
> -          I created 4 instances on EC2 (us-east-1 region) using the
> cloud-provisioning module (CloudBridge python). Out of the 4, 3 were meant
> to be mesos masters & 1 slave.
> *Note**: The instance inbound & outbount traffic is wideopen.*
>
> -          I skipped step-1 & step-2 in your README, since I manually
> provisioned the instances. Next, I updated “hosts” file with public IPs for
> all 4 instances. And also updated the “roles/zookeeper/vars/main.yml” file
> with the private IPs of 3 master instances.
>
> -          I executed the “ansible-playbook -i hosts site.yml -t
> "mesos-master"” command, and I get the following error:
>
>
>
> TASK [mesos-master : install firewalld]
> ****************************************
>
> ok: [52.91.152.1]
>
> ok: [52.87.235.79]
>
> ok: [54.167.94.186]
>
>
>
> TASK [mesos-master : start firewalld]
> ******************************************
>
> ok: [52.91.152.1]
>
> ok: [52.87.235.79]
>
> ok: [54.167.94.186]
>
>
>
> TASK [mesos-master : open ports]
> ***********************************************
>
> ok: [52.91.152.1] => (item=5050/tcp)
>
> ok: [52.87.235.79] => (item=5050/tcp)
>
> ok: [54.167.94.186] => (item=5050/tcp)
>
> ok: [52.87.235.79] => (item=8080/tcp)
>
> ok: [54.167.94.186] => (item=8080/tcp)
>
> ok: [52.91.152.1] => (item=8080/tcp)
>
>
>
> TASK [mesos-master : install utility - TODO delete this]
> ***********************
>
> ok: [52.91.152.1] => (item=[u'vim'])
>
> ok: [52.87.235.79] => (item=[u'vim'])
>
> ok: [54.167.94.186] => (item=[u'vim'])
>
>
>
> TASK [mesos-master : add mesosphere rpm]
> ***************************************
>
> ok: [52.91.152.1]
>
> ok: [52.87.235.79]
>
> ok: [54.167.94.186]
>
>
>
> TASK [mesos-master : install mesos and marathon]
> *******************************
>
> failed: [52.91.152.1] (item=[u'mesos', u'marathon']) => {"changed": true,
> "failed": true, "item": ["mesos", "marathon"], "msg": "Error unpacking rpm
> package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, "results": ["All
> packages providing mesos are up to date", "Loaded plugins:
> fastestmirror\nLoading mirror speeds from cached hostfile\n * base:
> mirrors.tripadvisor.com\n * extras: centos.hostingxtreme.com\n * updates:
> mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running
> transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be
> installed\n--> Finished Dependency Resolution\n\nDependencies
> Resolved\n\n================================================================================\n
> Package         Arch          Version                  Repository
> Size\n================================================================================\nInstalling:\n
> marathon        x86_64        1.3.0-1.0.506.el7        mesosphere
> 17 M\n\nTransaction
> Summary\n================================================================================\nInstall
> 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading
> packages:\nRunning transaction check\nRunning transaction test\nTransaction
> test succeeded\nRunning transaction\n  Installing :
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 \nerror:
> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio:
> read\n  Verifying  :
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1
> \n\nFailed:\n  marathon.x86_64
> 0:1.3.0-1.0.506.el7
> \n\nComplete!\n"]}
>
> failed: [52.87.235.79] (item=[u'mesos', u'marathon']) => {"changed": true,
> "failed": true, "item": ["mesos", "marathon"], "msg": "Error unpacking rpm
> package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, "results": ["All
> packages providing mesos are up to date", "Loaded plugins:
> fastestmirror\nLoading mirror speeds from cached hostfile\n * base:
> mirrors.tripadvisor.com\n * extras: mirrors.evowise.com\n * updates:
> mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running
> transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be
> installed\n--> Finished Dependency Resolution\n\nDependencies
> Resolved\n\n================================================================================\n
> Package         Arch          Version                  Repository
> Size\n================================================================================\nInstalling:\n
> marathon        x86_64        1.3.0-1.0.506.el7        mesosphere
> 17 M\n\nTransaction
> Summary\n================================================================================\nInstall
> 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading
> packages:\nRunning transaction check\nRunning transaction test\nTransaction
> test succeeded\nRunning transaction\n  Installing :
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 \nerror:
> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio:
> read\n  Verifying  :
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1
> \n\nFailed:\n  marathon.x86_64
> 0:1.3.0-1.0.506.el7
> \n\nComplete!\n"]}
>
> failed: [54.167.94.186] (item=[u'mesos', u'marathon']) => {"changed":
> true, "failed": true, "item": ["mesos", "marathon"], "msg": "Error
> unpacking rpm package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1,
> "results": ["All packages providing mesos are up to date", "Loaded plugins:
> fastestmirror\nLoading mirror speeds from cached hostfile\n * base:
> mirrors.tripadvisor.com\n * extras: mirrors.evowise.com\n * updates:
> mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running
> transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be
> installed\n--> Finished Dependency Resolution\n\nDependencies
> Resolved\n\n================================================================================\n
> Package         Arch          Version                  Repository
> Size\n================================================================================\nInstalling:\n
> marathon        x86_64        1.3.0-1.0.506.el7        mesosphere
> 17 M\n\nTransaction
> Summary\n================================================================================\nInstall
> 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading
> packages:\nRunning transaction check\nRunning transaction test\nTransaction
> test succeeded\nRunning transaction\n  Installing :
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 \nerror:
> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio:
> read\n  Verifying  :
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1
> \n\nFailed:\n  marathon.x86_64
> 0:1.3.0-1.0.506.el7
> \n\nComplete!\n"]}
>
>
>
> NO MORE HOSTS LEFT
> *************************************************************
>
>
>
> RUNNING HANDLER [zookeeper : restart zookeeper]
> ********************************
>
> *[WARNING]: Could not create retry file 'site.retry'.         [Errno 2] No
> such file or directory: ''*
>
>
>
>
>
> PLAY RECAP
> *********************************************************************
>
> 52.87.235.79               : ok=17   changed=2    unreachable=0
> failed=1
>
> 52.91.152.1                : ok=17   changed=2    unreachable=0
> failed=1
>
> 54.167.94.186              : ok=17   changed=2    unreachable=0
> failed=1
>
> localhost                  : ok=1    changed=0    unreachable=0    failed=0
>
>
>
> Is there some step that I am missing? It looks like the instances are not
> able to communicate because of the firewall? This is just a wild guess. Any
> help here is appreciated.
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Shameera Rathnayaka <shameerai...@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Monday, September 12, 2016 at 11:19 AM
> *To: *dev <dev@airavata.apache.org>
> *Subject: *Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>
>
>
> Hi Dev,
>
>
>
> The effort of getting use Cloud infrastructure to run MPI and BigData jobs
> using Airavata, we use Apache Mesos as  resource allocation framework to
> manage different type of clusters (i.e HPC node cluster to run MPI jobs and
> spark, Hadoop big data clusters to run bigdata applications). I came up
> with Ansible script to spin up Mesos cluster on the target set of nodes.
> You can find the script herehttps://github.com/shamrath/mesos-deployment I
> am thinking of  move this code to Airavata if all agreed. I would happy to
> answer any question related to this.
>
>
>
> Thanks,
>
> Shameera.
>
> --
>
> Shameera Rathnayaka
>
> --
>
> Shameera Rathnayaka
>
> --
>
> Shameera Rathnayaka
>
>
>
>
>
-- 
Shameera Rathnayaka

Reply via email to