Hi Devs,

As there were no any objections, I imported my local mesos ansible script
from https://github.com/shamrath/mesos-deployment  which already under
Apache 2.0 License, to Apache Airavata preserving all history.

Thanks,
Shameera.

On Wed, Sep 21, 2016 at 7:13 PM Shameera Rathnayaka <shameerai...@gmail.com>
wrote:

> Hi Gourav,
>
> This is known issue, I have already mentioned above workaround in the
> project README file, see below
>
>
>    1.
>
>    set valid aws credentials in roles/ec2/vars/aws-credential.yml if it
>    doesn't work add following to ec2 task in roles/ec2/tasks/main.yml
>
>    aws_access_key: <your_valid_access_key>
>
>    aws_secret_key: <your_valid_secret_key?
>
>
> Regards,
> Shameera.
>
>
> On Wed, Sep 21, 2016 at 6:26 PM Shenoy, Gourav Ganesh <
> goshe...@indiana.edu> wrote:
>
>> Hi dev,
>>
>>
>>
>> I just hit another problem with the ansible script for mesos-deployment.
>> This issue is related to creating instances in ec2 using the ansible
>> playbook. The fix is mentioned later below.
>>
>>
>>
>> In particular, when you run the command (which would spin up 4 machines
>> in EC2):
>>
>> ansible-playbook -i hosts site.yml -t "ec2"
>>
>>
>>
>> you might see the below authentication error:
>>
>>
>>
>> TASK [ec2 : create a aws instace/s]
>> ********************************************
>>
>> failed: [localhost] (item=gs-mesos-master-1) => {"failed": true, "item":
>> "gs-mesos-master-1", "msg": "No handler was ready to authenticate. 1
>> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"}
>>
>> failed: [localhost] (item=gs-mesos-master-2) => {"failed": true, "item":
>> "gs-mesos-master-2", "msg": "No handler was ready to authenticate. 1
>> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"}
>>
>> failed: [localhost] (item=gs-mesos-master-3) => {"failed": true, "item":
>> "gs-mesos-master-3", "msg": "No handler was ready to authenticate. 1
>> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"}
>>
>> failed: [localhost] (item=gs-mesos-slave-1) => {"failed": true, "item":
>> "gs-mesos-slave-1", "msg": "No handler was ready to authenticate. 1
>> handlers were checked. ['HmacAuthV4Handler'] Check your credentials"}
>>
>>
>>
>> This is because the ansible playbook is not able to authenticate the
>> user, even if you have updated the “roles/ec2/vars/aws-credential.yml” file
>> with your AWS access & secret keys.
>>
>>
>>
>> I was able to resolve this issue by adding the following (highlighted in
>> yellow) to “roles/ec2/tasks/main.yml” file – which runs the task of
>> creating the EC2 instances.
>>
>>
>>
>> - name: create a aws instace/s
>>
>>   ec2:
>>
>>     aws_access_key: "{{aws_access_key}}"
>>
>>     aws_secret_key: "{{aws_secret_key}}"
>>
>>     key_name: "{{ key_name }}"
>>
>>     region: us-east-1
>>
>>
>>
>> Basically, this ansible task had no way of knowing the user credentials
>> when it tried to create the instance(s), hence the error. Hope this helps!
>>
>>
>>
>> @Shameera,
>>
>> Is this a valid fix? If yes, could you update the ansible script? Thanks
>> in advance.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Suresh Marru <sma...@apache.org>
>> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>> *Date: *Friday, September 16, 2016 at 11:02 PM
>>
>>
>> *To: *Airavata Dev <dev@airavata.apache.org>
>> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>>
>>
>>
>> Hi Gourav,
>>
>>
>>
>> Thank you for this excellent communication. Hope others will follow suite
>> on such mailing lists updates. When you post such nontrivial diagnosis to
>> the mailing lists, others having trouble will be able to search on this
>> thread and follow these to fix.
>>
>>
>>
>> Hoping to see lot more dev list threads similar to this one.
>>
>>
>>
>> Suresh
>>
>>
>>
>> On Sep 16, 2016, at 10:16 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu>
>> wrote:
>>
>>
>>
>> Hi dev,
>>
>>
>>
>> I finally managed to get the mesos-marathon cluster up & running using
>> the Ansible script. There were couple of issues because of which things
>> were failing. I have listed the problems faced during installation & the
>> solutions that fixed things for me.
>>
>>
>>
>> 1.  Marathon was not getting installed – This is because Marathon just
>> released a new build (version: 1.3.0-1.0.506.el7) 2 days back and
>> apparently the RPM for this version is corrupt, and hence a plain “yum
>> install marathon” fails. To get around, I listed all versions of marathon
>> present in the repository.
>> # yum --showduplicates list marathon | expand
>> marathon.x86_64                 1.1.3-1.0.503.el7
>> mesosphere
>>
>> marathon.x86_64                 1.3.0-1.0.506.el7
>> mesosphere
>>
>> The next latest version was 1.1.3-1.0.503.el7 which seemed stable, and
>> hence I updated the ansible task to use this version for marathon.
>>
>> In “roles/mesos-master/tasks/main.yml” I updated the following:
>> - name: install mesos and marathon
>>
>>   yum:
>>
>>     name: "{{ item }}"
>>
>>   with_items:
>>
>>     - mesos
>>
>>     - marathon-1.1.3-1.0.503.el7
>>
>>
>> The mesos-marathon cluster installation worked perfectly fine after this
>> change.
>>
>>
>>
>> 2.       Even after this, the command “mesos-resolve `cat
>> /etc/mesos/zk`” was failing with the error Failed to obtain the IP
>> address for 'ip-172-30-1-197'; the DNS service may not be able to resolve
>> it: Name or service not known
>>
>> Apparently it couldn’t resolve the hostname for the local master machine.
>> I resolved this issue by adding a host entry in each master node.
>> Eg: On master node which threw above error, I added the host entry
>> (/etc/hosts):
>> *172.30.1.197       ip-172-30-1-197*
>>
>>
>>
>> After this I was able to get the master-ip and visit the mesos dashboard
>> (master-ip:5050)
>>
>>
>>
>> 3.       I noticed that although I was able to view the mesos dashboard,
>> I couldn’t access the marathon dashboard. The connection to
>> <master-ip>:8080 was getting refused. I then restarted the marathon service
>> on the master node – sudo service marathon restart. After this I was able
>> to access the marathon dashboard as well.
>>
>>
>>
>> Hope this helps!
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *"Shenoy, Gourav Ganesh" <goshe...@indiana.edu>
>> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>> *Date: *Friday, September 16, 2016 at 3:52 PM
>> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>>
>>
>>
>> Hi Shameera,
>>
>>
>>
>> As discussed, after commenting out the “marathon” section the ansible
>> playbooks execute without errors. But when I try to get the master-ip using
>> “mesos-resolve”, I get an error:
>>
>>
>>
>> I SSH’ed into one of the master machine and tried to check the status of
>> the mesos-master service, seems like the service is in failed state. See
>> the trace below:
>>
>>
>>
>> [centos@ip-172-30-1-39 ~]$ sudo service mesos-master status
>>
>> Redirecting to /bin/systemctl status  mesos-master.service
>>
>> ● mesos-master.service - Mesos Master
>>
>>    Loaded: loaded (/usr/lib/systemd/system/mesos-master.service; enabled;
>> vendor preset: disabled)
>>
>>    Active: activating (auto-restart) (Result: exit-code) since Fri
>> 2016-09-16 19:46:37 UTC; 18s ago
>>
>>   Process: 12608 ExecStart=/usr/bin/mesos-init-wrapper master *(code=exited,
>> status=1/FAILURE)*
>>
>> Main PID: 12608 (code=exited, status=1/FAILURE)
>>
>>
>>
>> Sep 16 19:46:37 ip-172-30-1-39 systemd[1]: *Unit mesos-master.service
>> entered failed state.*
>>
>> Sep 16 19:46:37 ip-172-30-1-39 systemd[1]: *mesos-master.service failed.*
>>
>>
>>
>> Hope this helps debugging the problem.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Suresh Marru <sma...@apache.org>
>> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>> *Date: *Friday, September 16, 2016 at 9:30 AM
>> *To: *Airavata Dev <dev@airavata.apache.org>
>> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>>
>>
>>
>> Hi Shameera,
>>
>>
>>
>> All of these are great directions for Airavata, thank you for pushing the
>> Ansible and Mesos deployments on the clouds. I think it will be better if
>> we get your scripts into Airavata repo and all of us collectively work on
>> it. Looks like atleast Pankaj and Gourav will also be able to contribution
>> in addition to you.
>>
>>
>>
>> Suresh
>>
>>
>>
>> On Sep 15, 2016, at 8:59 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu>
>> wrote:
>>
>>
>>
>> Sure, thanks Shameera. I will try this.
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Shameera Rathnayaka <shameerai...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>> *Date: *Thursday, September 15, 2016 at 8:55 PM
>> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>>
>>
>>
>> Interesting, I am also getting the same issue. The same script worked
>> perfectly yesterday. I doubt some issue with marathon rpm. By removing
>> marathon installation Mesos get installed without any issue.
>>
>>
>>
>> to remove marathon installation do following to 
>> /roles/mesos-master/tasks/main.yml
>> file.
>>
>>
>>
>> 1. comment marathon in "install mesos and marathon" task
>>
>> 2. comment the last task which start marathon
>>
>>
>>
>> Meanwhile, i will try to find exact reason.
>>
>>
>>
>> ~ Shameera.
>>
>>
>>
>> On Thu, Sep 15, 2016 at 8:32 PM Shenoy, Gourav Ganesh <
>> goshe...@indiana.edu> wrote:
>>
>> Hi Shameera,
>>
>>
>>
>> I am using the same image which you used (centos_ami_7_2: ami-6d1c2007).
>>
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Shameera Rathnayaka <shameerai...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>> *Date: *Thursday, September 15, 2016 at 8:26 PM
>> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>> *Subject: *Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>>
>>
>>
>> Hi Gourav,
>>
>>
>>
>> According to the error, something have happened while unpacking marathon
>> bundle, see:  Installing :
>> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 \nerror:
>> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio:
>> read\n  Verifying  :
>> marathon-1.3.0-1.0.506.el7.x86_64                            1/1
>> \n\nFailed:\n  marathon.x86_64 0:1.3.0-1.0.506.el7
>>
>>
>>
>> What OS image and version you used to create instances? I tested with
>> centos 7.2 and it works fine.
>>
>>
>>
>> ~ Shameera.
>>
>>
>>
>>
>>
>> On Thu, Sep 15, 2016 at 8:14 PM Shenoy, Gourav Ganesh <
>> goshe...@indiana.edu> wrote:
>>
>> Hi Shameera,
>>
>>
>>
>> I am trying to build a mesos cluster on EC2 using your playbooks. But I
>> am facing some issues. Please find the details below:
>>
>>
>>
>> *Details:*
>>
>> -          I created 4 instances on EC2 (us-east-1 region) using the
>> cloud-provisioning module (CloudBridge python). Out of the 4, 3 were meant
>> to be mesos masters & 1 slave.
>> *Note**: The instance inbound & outbount traffic is wideopen.*
>>
>> -          I skipped step-1 & step-2 in your README, since I manually
>> provisioned the instances. Next, I updated “hosts” file with public IPs for
>> all 4 instances. And also updated the “roles/zookeeper/vars/main.yml” file
>> with the private IPs of 3 master instances.
>>
>> -          I executed the “ansible-playbook -i hosts site.yml -t
>> "mesos-master"” command, and I get the following error:
>>
>>
>>
>> TASK [mesos-master : install firewalld]
>> ****************************************
>>
>> ok: [52.91.152.1]
>>
>> ok: [52.87.235.79]
>>
>> ok: [54.167.94.186]
>>
>>
>>
>> TASK [mesos-master : start firewalld]
>> ******************************************
>>
>> ok: [52.91.152.1]
>>
>> ok: [52.87.235.79]
>>
>> ok: [54.167.94.186]
>>
>>
>>
>> TASK [mesos-master : open ports]
>> ***********************************************
>>
>> ok: [52.91.152.1] => (item=5050/tcp)
>>
>> ok: [52.87.235.79] => (item=5050/tcp)
>>
>> ok: [54.167.94.186] => (item=5050/tcp)
>>
>> ok: [52.87.235.79] => (item=8080/tcp)
>>
>> ok: [54.167.94.186] => (item=8080/tcp)
>>
>> ok: [52.91.152.1] => (item=8080/tcp)
>>
>>
>>
>> TASK [mesos-master : install utility - TODO delete this]
>> ***********************
>>
>> ok: [52.91.152.1] => (item=[u'vim'])
>>
>> ok: [52.87.235.79] => (item=[u'vim'])
>>
>> ok: [54.167.94.186] => (item=[u'vim'])
>>
>>
>>
>> TASK [mesos-master : add mesosphere rpm]
>> ***************************************
>>
>> ok: [52.91.152.1]
>>
>> ok: [52.87.235.79]
>>
>> ok: [54.167.94.186]
>>
>>
>>
>> TASK [mesos-master : install mesos and marathon]
>> *******************************
>>
>> failed: [52.91.152.1] (item=[u'mesos', u'marathon']) => {"changed": true,
>> "failed": true, "item": ["mesos", "marathon"], "msg": "Error unpacking rpm
>> package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, "results": ["All
>> packages providing mesos are up to date", "Loaded plugins:
>> fastestmirror\nLoading mirror speeds from cached hostfile\n * base:
>> mirrors.tripadvisor.com\n * extras: centos.hostingxtreme.com\n * updates:
>>  mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running
>> transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be
>> installed\n--> Finished Dependency Resolution\n\nDependencies
>> Resolved\n\n================================================================================\n
>> Package         Arch          Version                  Repository
>> Size\n================================================================================\nInstalling:\n
>> marathon        x86_64        1.3.0-1.0.506.el7        mesosphere
>> 17 M\n\nTransaction
>> Summary\n================================================================================\nInstall
>> 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading
>> packages:\nRunning transaction check\nRunning transaction test\nTransaction
>> test succeeded\nRunning transaction\n  Installing :
>> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 \nerror:
>> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio:
>> read\n  Verifying  :
>> marathon-1.3.0-1.0.506.el7.x86_64                            1/1
>> \n\nFailed:\n  marathon.x86_64
>> 0:1.3.0-1.0.506.el7
>> \n\nComplete!\n"]}
>>
>> failed: [52.87.235.79] (item=[u'mesos', u'marathon']) => {"changed":
>> true, "failed": true, "item": ["mesos", "marathon"], "msg": "Error
>> unpacking rpm package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1,
>> "results": ["All packages providing mesos are up to date", "Loaded plugins:
>> fastestmirror\nLoading mirror speeds from cached hostfile\n * base:
>> mirrors.tripadvisor.com\n * extras: mirrors.evowise.com\n * updates:
>> mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running
>> transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be
>> installed\n--> Finished Dependency Resolution\n\nDependencies
>> Resolved\n\n================================================================================\n
>> Package         Arch          Version                  Repository
>> Size\n================================================================================\nInstalling:\n
>> marathon        x86_64        1.3.0-1.0.506.el7        mesosphere
>> 17 M\n\nTransaction
>> Summary\n================================================================================\nInstall
>> 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading
>> packages:\nRunning transaction check\nRunning transaction test\nTransaction
>> test succeeded\nRunning transaction\n  Installing :
>> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 \nerror:
>> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio:
>> read\n  Verifying  :
>> marathon-1.3.0-1.0.506.el7.x86_64                            1/1
>> \n\nFailed:\n  marathon.x86_64
>> 0:1.3.0-1.0.506.el7
>> \n\nComplete!\n"]}
>>
>> failed: [54.167.94.186] (item=[u'mesos', u'marathon']) => {"changed":
>> true, "failed": true, "item": ["mesos", "marathon"], "msg": "Error
>> unpacking rpm package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1,
>> "results": ["All packages providing mesos are up to date", "Loaded plugins:
>> fastestmirror\nLoading mirror speeds from cached hostfile\n * base:
>> mirrors.tripadvisor.com\n * extras: mirrors.evowise.com\n * updates:
>> mirrors.greenmountainaccess.net\nResolving Dependencies\n--> Running
>> transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 will be
>> installed\n--> Finished Dependency Resolution\n\nDependencies
>> Resolved\n\n================================================================================\n
>> Package         Arch          Version                  Repository
>> Size\n================================================================================\nInstalling:\n
>> marathon        x86_64        1.3.0-1.0.506.el7        mesosphere
>> 17 M\n\nTransaction
>> Summary\n================================================================================\nInstall
>> 1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading
>> packages:\nRunning transaction check\nRunning transaction test\nTransaction
>> test succeeded\nRunning transaction\n  Installing :
>> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 \nerror:
>> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio:
>> read\n  Verifying  :
>> marathon-1.3.0-1.0.506.el7.x86_64                            1/1
>> \n\nFailed:\n  marathon.x86_64
>> 0:1.3.0-1.0.506.el7
>> \n\nComplete!\n"]}
>>
>>
>>
>> NO MORE HOSTS LEFT
>> *************************************************************
>>
>>
>>
>> RUNNING HANDLER [zookeeper : restart zookeeper]
>> ********************************
>>
>> *[WARNING]: Could not create retry file 'site.retry'.         [Errno 2]
>> No such file or directory: ''*
>>
>>
>>
>>
>>
>> PLAY RECAP
>> *********************************************************************
>>
>> 52.87.235.79               : ok=17   changed=2    unreachable=0
>> failed=1
>>
>> 52.91.152.1                : ok=17   changed=2    unreachable=0
>> failed=1
>>
>> 54.167.94.186              : ok=17   changed=2    unreachable=0
>> failed=1
>>
>> localhost                  : ok=1    changed=0    unreachable=0
>> failed=0
>>
>>
>>
>> Is there some step that I am missing? It looks like the instances are not
>> able to communicate because of the firewall? This is just a wild guess. Any
>> help here is appreciated.
>>
>>
>> Thanks and Regards,
>>
>> Gourav Shenoy
>>
>>
>>
>> *From: *Shameera Rathnayaka <shameerai...@gmail.com>
>> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>> *Date: *Monday, September 12, 2016 at 11:19 AM
>> *To: *dev <dev@airavata.apache.org>
>> *Subject: *Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>>
>>
>>
>> Hi Dev,
>>
>>
>>
>> The effort of getting use Cloud infrastructure to run MPI and BigData
>> jobs using Airavata, we use Apache Mesos as  resource allocation framework
>> to manage different type of clusters (i.e HPC node cluster to run MPI jobs
>> and spark, Hadoop big data clusters to run bigdata applications). I came up
>> with Ansible script to spin up Mesos cluster on the target set of nodes.
>> You can find the script herehttps://github.com/shamrath/mesos-deployment I
>> am thinking of  move this code to Airavata if all agreed. I would happy to
>> answer any question related to this.
>>
>>
>>
>> Thanks,
>>
>> Shameera.
>>
>> --
>>
>> Shameera Rathnayaka
>>
>> --
>>
>> Shameera Rathnayaka
>>
>> --
>>
>> Shameera Rathnayaka
>>
>>
>>
>>
>>
> --
> Shameera Rathnayaka
>
-- 
Shameera Rathnayaka

Reply via email to