Hi Gourav,

Thank you for this excellent communication. Hope others will follow suite on 
such mailing lists updates. When you post such nontrivial diagnosis to the 
mailing lists, others having trouble will be able to search on this thread and 
follow these to fix. 

Hoping to see lot more dev list threads similar to this one.

Suresh

> On Sep 16, 2016, at 10:16 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu> 
> wrote:
> 
> Hi dev,
>  
> I finally managed to get the mesos-marathon cluster up & running using the 
> Ansible script. There were couple of issues because of which things were 
> failing. I have listed the problems faced during installation & the solutions 
> that fixed things for me.
>  
> 1.  Marathon was not getting installed – This is because Marathon just 
> released a new build (version: 1.3.0-1.0.506.el7) 2 days back and apparently 
> the RPM for this version is corrupt, and hence a plain “yum install marathon” 
> fails. To get around, I listed all versions of marathon present in the 
> repository.
> # yum --showduplicates list marathon | expand
> marathon.x86_64                 1.1.3-1.0.503.el7                    
> mesosphere 
> marathon.x86_64                 1.3.0-1.0.506.el7                    
> mesosphere
> 
> The next latest version was 1.1.3-1.0.503.el7 which seemed stable, and hence 
> I updated the ansible task to use this version for marathon.
> 
> In “roles/mesos-master/tasks/main.yml” I updated the following:
> - name: install mesos and marathon
>   yum:
>     name: "{{ item }}"
>   with_items:
>     - mesos
>     - marathon-1.1.3-1.0.503.el7
> 
> The mesos-marathon cluster installation worked perfectly fine after this 
> change. 
>  
> 2.       Even after this, the command “mesos-resolve `cat /etc/mesos/zk`” was 
> failing with the error Failed to obtain the IP address for 'ip-172-30-1-197'; 
> the DNS service may not be able to resolve it: Name or service not known 
> 
> Apparently it couldn’t resolve the hostname for the local master machine. I 
> resolved this issue by adding a host entry in each master node. 
> Eg: On master node which threw above error, I added the host entry 
> (/etc/hosts):
> 172.30.1.197       ip-172-30-1-197
>  
> After this I was able to get the master-ip and visit the mesos dashboard 
> (master-ip:5050)
>  
> 3.       I noticed that although I was able to view the mesos dashboard, I 
> couldn’t access the marathon dashboard. The connection to <master-ip>:8080 
> was getting refused. I then restarted the marathon service on the master node 
> – sudo service marathon restart. After this I was able to access the marathon 
> dashboard as well.
>  
> Hope this helps!
>  
> Thanks and Regards,
> Gourav Shenoy
>  
> From: "Shenoy, Gourav Ganesh" <goshe...@indiana.edu>
> Reply-To: "dev@airavata.apache.org" <dev@airavata.apache.org>
> Date: Friday, September 16, 2016 at 3:52 PM
> To: "dev@airavata.apache.org" <dev@airavata.apache.org>
> Subject: Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>  
> Hi Shameera,
>  
> As discussed, after commenting out the “marathon” section the ansible 
> playbooks execute without errors. But when I try to get the master-ip using 
> “mesos-resolve”, I get an error:
>  
> I SSH’ed into one of the master machine and tried to check the status of the 
> mesos-master service, seems like the service is in failed state. See the 
> trace below:
>  
> [centos@ip-172-30-1-39 ~]$ sudo service mesos-master status
> Redirecting to /bin/systemctl status  mesos-master.service
> ● mesos-master.service - Mesos Master
>    Loaded: loaded (/usr/lib/systemd/system/mesos-master.service; enabled; 
> vendor preset: disabled)
>    Active: activating (auto-restart) (Result: exit-code) since Fri 2016-09-16 
> 19:46:37 UTC; 18s ago
>   Process: 12608 ExecStart=/usr/bin/mesos-init-wrapper master (code=exited, 
> status=1/FAILURE)
> Main PID: 12608 (code=exited, status=1/FAILURE)
>  
> Sep 16 19:46:37 ip-172-30-1-39 systemd[1]: Unit mesos-master.service entered 
> failed state.
> Sep 16 19:46:37 ip-172-30-1-39 systemd[1]: mesos-master.service failed.
>  
> Hope this helps debugging the problem.
>  
> Thanks and Regards,
> Gourav Shenoy
>  
> From: Suresh Marru <sma...@apache.org>
> Reply-To: "dev@airavata.apache.org" <dev@airavata.apache.org>
> Date: Friday, September 16, 2016 at 9:30 AM
> To: Airavata Dev <dev@airavata.apache.org>
> Subject: Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>  
> Hi Shameera, 
>  
> All of these are great directions for Airavata, thank you for pushing the 
> Ansible and Mesos deployments on the clouds. I think it will be better if we 
> get your scripts into Airavata repo and all of us collectively work on it. 
> Looks like atleast Pankaj and Gourav will also be able to contribution in 
> addition to you. 
>  
> Suresh
>  
> On Sep 15, 2016, at 8:59 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu 
> <mailto:goshe...@indiana.edu>> wrote:
>  
> Sure, thanks Shameera. I will try this.
>  
> Thanks and Regards,
> Gourav Shenoy
>  
> From: Shameera Rathnayaka <shameerai...@gmail.com 
> <mailto:shameerai...@gmail.com>>
> Reply-To: "dev@airavata.apache.org <mailto:dev@airavata.apache.org>" 
> <dev@airavata.apache.org <mailto:dev@airavata.apache.org>>
> Date: Thursday, September 15, 2016 at 8:55 PM
> To: "dev@airavata.apache.org <mailto:dev@airavata.apache.org>" 
> <dev@airavata.apache.org <mailto:dev@airavata.apache.org>>
> Subject: Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>  
> Interesting, I am also getting the same issue. The same script worked 
> perfectly yesterday. I doubt some issue with marathon rpm. By removing 
> marathon installation Mesos get installed without any issue. 
>  
> to remove marathon installation do following to 
> /roles/mesos-master/tasks/main.yml file.
>  
> 1. comment marathon in "install mesos and marathon" task
> 2. comment the last task which start marathon
>  
> Meanwhile, i will try to find exact reason.
>  
> ~ Shameera.
>  
> On Thu, Sep 15, 2016 at 8:32 PM Shenoy, Gourav Ganesh <goshe...@indiana.edu 
> <mailto:goshe...@indiana.edu>> wrote:
> Hi Shameera,
>  
> I am using the same image which you used (centos_ami_7_2: ami-6d1c2007).
>  
> Thanks and Regards,
> Gourav Shenoy
>  
> From: Shameera Rathnayaka <shameerai...@gmail.com 
> <mailto:shameerai...@gmail.com>>
> Reply-To: "dev@airavata.apache.org <mailto:dev@airavata.apache.org>" 
> <dev@airavata.apache.org <mailto:dev@airavata.apache.org>>
> Date: Thursday, September 15, 2016 at 8:26 PM
> To: "dev@airavata.apache.org <mailto:dev@airavata.apache.org>" 
> <dev@airavata.apache.org <mailto:dev@airavata.apache.org>>
> Subject: Re: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>  
> Hi Gourav, 
>  
> According to the error, something have happened while unpacking marathon 
> bundle, see:  Installing : marathon-1.3.0-1.0.506.el7.x86_64                  
>           1/1 \nerror: unpacking of archive failed on file 
> /usr/bin/marathon;57daffff: cpio: read\n  Verifying  : 
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 
> \n\nFailed:\n  marathon.x86_64 0:1.3.0-1.0.506.el7
>  
> What OS image and version you used to create instances? I tested with centos 
> 7.2 and it works fine. 
>  
> ~ Shameera.
>  
>  
> On Thu, Sep 15, 2016 at 8:14 PM Shenoy, Gourav Ganesh <goshe...@indiana.edu 
> <mailto:goshe...@indiana.edu>> wrote:
> Hi Shameera,
>  
> I am trying to build a mesos cluster on EC2 using your playbooks. But I am 
> facing some issues. Please find the details below:
>  
> Details:
> -          I created 4 instances on EC2 (us-east-1 region) using the 
> cloud-provisioning module (CloudBridge python). Out of the 4, 3 were meant to 
> be mesos masters & 1 slave. 
> Note: The instance inbound & outbount traffic is wideopen.
> -          I skipped step-1 & step-2 in your README, since I manually 
> provisioned the instances. Next, I updated “hosts” file with public IPs for 
> all 4 instances. And also updated the “roles/zookeeper/vars/main.yml” file 
> with the private IPs of 3 master instances.
> -          I executed the “ansible-playbook -i hosts site.yml -t 
> "mesos-master"” command, and I get the following error:
>  
> TASK [mesos-master : install firewalld] 
> ****************************************
> ok: [52.91.152.1]
> ok: [52.87.235.79]
> ok: [54.167.94.186]
>  
> TASK [mesos-master : start firewalld] 
> ******************************************
> ok: [52.91.152.1]
> ok: [52.87.235.79]
> ok: [54.167.94.186]
>  
> TASK [mesos-master : open ports] 
> ***********************************************
> ok: [52.91.152.1] => (item=5050/tcp)
> ok: [52.87.235.79] => (item=5050/tcp)
> ok: [54.167.94.186] => (item=5050/tcp)
> ok: [52.87.235.79] => (item=8080/tcp)
> ok: [54.167.94.186] => (item=8080/tcp)
> ok: [52.91.152.1] => (item=8080/tcp)
>  
> TASK [mesos-master : install utility - TODO delete this] 
> ***********************
> ok: [52.91.152.1] => (item=[u'vim'])
> ok: [52.87.235.79] => (item=[u'vim'])
> ok: [54.167.94.186] => (item=[u'vim'])
>  
> TASK [mesos-master : add mesosphere rpm] 
> ***************************************
> ok: [52.91.152.1]
> ok: [52.87.235.79]
> ok: [54.167.94.186]
>  
> TASK [mesos-master : install mesos and marathon] 
> *******************************
> failed: [52.91.152.1] (item=[u'mesos', u'marathon']) => {"changed": true, 
> "failed": true, "item": ["mesos", "marathon"], "msg": "Error unpacking rpm 
> package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, "results": ["All 
> packages providing mesos are up to date", "Loaded plugins: 
> fastestmirror\nLoading mirror speeds from cached hostfile\n * base: 
> mirrors.tripadvisor.com <http://mirrors.tripadvisor.com/>\n * extras: 
> centos.hostingxtreme.com <http://centos.hostingxtreme.com/>\n * updates: 
> mirrors.greenmountainaccess.net 
> <http://mirrors.greenmountainaccess.net/>\nResolving Dependencies\n--> 
> Running transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 
> will be installed\n--> Finished Dependency Resolution\n\nDependencies 
> Resolved\n\n================================================================================\n
>  Package         Arch          Version                  Repository         
> Size\n================================================================================\nInstalling:\n
>  marathon        x86_64        1.3.0-1.0.506.el7        mesosphere         17 
> M\n\nTransaction 
> Summary\n================================================================================\nInstall
>   1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading 
> packages:\nRunning transaction check\nRunning transaction test\nTransaction 
> test succeeded\nRunning transaction\n  Installing : 
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 \nerror: 
> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio: read\n  
> Verifying  : marathon-1.3.0-1.0.506.el7.x86_64                            1/1 
> \n\nFailed:\n  marathon.x86_64 0:1.3.0-1.0.506.el7                            
>                \n\nComplete!\n"]}
> failed: [52.87.235.79] (item=[u'mesos', u'marathon']) => {"changed": true, 
> "failed": true, "item": ["mesos", "marathon"], "msg": "Error unpacking rpm 
> package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, "results": ["All 
> packages providing mesos are up to date", "Loaded plugins: 
> fastestmirror\nLoading mirror speeds from cached hostfile\n * base: 
> mirrors.tripadvisor.com <http://mirrors.tripadvisor.com/>\n * extras: 
> mirrors.evowise.com <http://mirrors.evowise.com/>\n * updates: 
> mirrors.greenmountainaccess.net 
> <http://mirrors.greenmountainaccess.net/>\nResolving Dependencies\n--> 
> Running transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 
> will be installed\n--> Finished Dependency Resolution\n\nDependencies 
> Resolved\n\n================================================================================\n
>  Package         Arch          Version                  Repository         
> Size\n================================================================================\nInstalling:\n
>  marathon        x86_64        1.3.0-1.0.506.el7        mesosphere         17 
> M\n\nTransaction 
> Summary\n================================================================================\nInstall
>   1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading 
> packages:\nRunning transaction check\nRunning transaction test\nTransaction 
> test succeeded\nRunning transaction\n  Installing : 
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 \nerror: 
> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio: read\n  
> Verifying  : marathon-1.3.0-1.0.506.el7.x86_64                            1/1 
> \n\nFailed:\n  marathon.x86_64 0:1.3.0-1.0.506.el7                            
>                \n\nComplete!\n"]}
> failed: [54.167.94.186] (item=[u'mesos', u'marathon']) => {"changed": true, 
> "failed": true, "item": ["mesos", "marathon"], "msg": "Error unpacking rpm 
> package marathon-1.3.0-1.0.506.el7.x86_64\n", "rc": 1, "results": ["All 
> packages providing mesos are up to date", "Loaded plugins: 
> fastestmirror\nLoading mirror speeds from cached hostfile\n * base: 
> mirrors.tripadvisor.com <http://mirrors.tripadvisor.com/>\n * extras: 
> mirrors.evowise.com <http://mirrors.evowise.com/>\n * updates: 
> mirrors.greenmountainaccess.net 
> <http://mirrors.greenmountainaccess.net/>\nResolving Dependencies\n--> 
> Running transaction check\n---> Package marathon.x86_64 0:1.3.0-1.0.506.el7 
> will be installed\n--> Finished Dependency Resolution\n\nDependencies 
> Resolved\n\n================================================================================\n
>  Package         Arch          Version                  Repository         
> Size\n================================================================================\nInstalling:\n
>  marathon        x86_64        1.3.0-1.0.506.el7        mesosphere         17 
> M\n\nTransaction 
> Summary\n================================================================================\nInstall
>   1 Package\n\nTotal download size: 17 M\nInstalled size: 89 M\nDownloading 
> packages:\nRunning transaction check\nRunning transaction test\nTransaction 
> test succeeded\nRunning transaction\n  Installing : 
> marathon-1.3.0-1.0.506.el7.x86_64                            1/1 \nerror: 
> unpacking of archive failed on file /usr/bin/marathon;57daffff: cpio: read\n  
> Verifying  : marathon-1.3.0-1.0.506.el7.x86_64                            1/1 
> \n\nFailed:\n  marathon.x86_64 0:1.3.0-1.0.506.el7                            
>                \n\nComplete!\n"]}
>  
> NO MORE HOSTS LEFT 
> *************************************************************
>  
> RUNNING HANDLER [zookeeper : restart zookeeper] 
> ********************************
> [WARNING]: Could not create retry file 'site.retry'.         [Errno 2] No 
> such file or directory: ''
>  
>  
> PLAY RECAP 
> *********************************************************************
> 52.87.235.79               : ok=17   changed=2    unreachable=0    failed=1   
> 52.91.152.1                : ok=17   changed=2    unreachable=0    failed=1   
> 54.167.94.186              : ok=17   changed=2    unreachable=0    failed=1  
> localhost                  : ok=1    changed=0    unreachable=0    failed=0
>  
> Is there some step that I am missing? It looks like the instances are not 
> able to communicate because of the firewall? This is just a wild guess. Any 
> help here is appreciated.
> 
> Thanks and Regards,
> Gourav Shenoy
>  
> From: Shameera Rathnayaka <shameerai...@gmail.com 
> <mailto:shameerai...@gmail.com>>
> Reply-To: "dev@airavata.apache.org <mailto:dev@airavata.apache.org>" 
> <dev@airavata.apache.org <mailto:dev@airavata.apache.org>>
> Date: Monday, September 12, 2016 at 11:19 AM
> To: dev <dev@airavata.apache.org <mailto:dev@airavata.apache.org>>
> Subject: Spinup Mesos-Marathon Cluster for Hybrid Scheduling
>  
> Hi Dev, 
>  
> The effort of getting use Cloud infrastructure to run MPI and BigData jobs 
> using Airavata, we use Apache Mesos as  resource allocation framework to 
> manage different type of clusters (i.e HPC node cluster to run MPI jobs and 
> spark, Hadoop big data clusters to run bigdata applications). I came up with 
> Ansible script to spin up Mesos cluster on the target set of nodes. You can 
> find the script herehttps://github.com/shamrath/mesos-deployment 
> <https://github.com/shamrath/mesos-deployment> I am thinking of  move this 
> code to Airavata if all agreed. I would happy to answer any question related 
> to this. 
>  
> Thanks, 
> Shameera.
> --
> Shameera Rathnayaka
> --
> Shameera Rathnayaka
> -- 
> Shameera Rathnayaka
>  

Reply via email to