Thanks for the reply Enis,

I had some off-list discussion with Simon Gladman about this. Simon
indicated it seems that OpenStack security groups are not propagating to
worker nodes at startup. Manually configuring the worker node security
group and then rebooting the master appears to be a workaround.
Hopefully this is only a temporary issue, and obviously one that's
specific to NeCTAR.

Cheers,
-Aaron

On Thu, 2015-10-15 at 21:59 -0400, Enis Afgan wrote:
> Hi Aaron,
> 
> Does the "AMQP Connection Failure:" error continue indefinitely?
> It would be helpful to see the log files from the time a worker is
> being added as well as the CloudMan logs from a worker node. It
> appears you're launching GVL v3.04 however, the default is now 4.0.0.
> Have you tried keeping the defaults?
> 
> 
> 
> 
> PS
> I'm cc'ing h...@genome.edu.au as the default help mailing list for the
> GVL.
> 
> 
> On Wed, Oct 14, 2015 at 11:31 PM, Aaron Darling
> <aaron.darl...@uts.edu.au> wrote:
> 
>         Hi all, I'm new to CloudMan, and trying to launch a cluster
>         via GVL (3 or 4) on NeCTAR.
>         I'm able to get a head node running without trouble via
>         launch.genome.edu.au, but launching worker nodes from the
>         CloudMan interface appears to fail. CloudMan reboots the
>         worker repeatedly before giving up. I logged into the worker
>         to inspect log files and found the following, but it's not
>         obvious to me what to do next. Hope this is something simple?
>         
>         
>         ubuntu@server-fbbd9a10-fb58-48d8-89cd-5ddd22821648:~$
>         cat /mnt/cm/paster.log
>         Python version:  (2, 7)
>         Image configuration suports: {'apps': ['cloudman', 'galaxy']}
>         2015-10-15 14:15:24,973 DEBUG            app:73   Initializing
>         app
>         2015-10-15 14:15:24,973 DEBUG            ec2:109  Gathering
>         instance zone, attempt 0
>         2015-10-15 14:15:25,140 DEBUG            ec2:115  Instance
>         zone is 'NCI'
>         2015-10-15 14:15:25,140 DEBUG            ec2:44   Gathering
>         instance ami, attempt 0
>         2015-10-15 14:15:25,459 DEBUG            app:76   Running on
>         'openstack' type of cloud in zone 'NCI' using image
>         'ami-00003484'.
>         2015-10-15 14:15:25,459 DEBUG            app:98   Getting
>         pd.yaml
>         2015-10-15 14:15:25,459 DEBUG      openstack:99   Establishing
>         a boto Swift connection.
>         2015-10-15 14:15:25,459 DEBUG      openstack:109  Got boto
>         Swift connection.
>         2015-10-15 14:15:26,112 DEBUG           misc:578  Retrieved
>         file 'persistent_data.yaml' from bucket
>         'cm-45b53bf5024e962bd27e15fd81fcc07d' on host
>         'swift.rc.nectar.org.au' to 'pd.yaml'.
>         2015-10-15 14:15:26,118 INFO             app:119  Worker
>         starting
>         2015-10-15 14:15:26,136 DEBUG            ec2:76   Gathering
>         instance id, attempt 0
>         2015-10-15 14:15:26,338 DEBUG            ec2:82   Instance ID
>         is 'i-0019a2fc'
>         2015-10-15 14:16:29,488 DEBUG           comm:134  AMQP
>         Connection Failure:  [Errno 110] Connection timed out
>         2015-10-15 14:16:29,492 DEBUG           base:57   Enabling
>         'root' controller, class: CM
>         2015-10-15 14:16:29,494 DEBUG       buildapp:93   Enabling
>         'httpexceptions' middleware
>         2015-10-15 14:16:29,496 DEBUG       buildapp:99   Enabling
>         'recursive' middleware
>         2015-10-15 14:16:29,499 DEBUG       buildapp:119  Enabling
>         'print debug' middleware
>         2015-10-15 14:16:29,506 DEBUG       buildapp:133  Enabling
>         'error' middleware
>         2015-10-15 14:16:29,507 DEBUG       buildapp:143  Enabling
>         'config' middleware
>         2015-10-15 14:16:29,508 DEBUG       buildapp:147  Enabling
>         'x-forwarded-host' middleware
>         2015-10-15 14:16:29,517 DEBUG           misc:768
>         'cp /etc/hosts /etc/hosts.orig' command OK
>         2015-10-15 14:16:29,528 DEBUG           misc:768
>         'cp /tmp/tmpuV3NTJ /etc/hosts' command OK
>         Starting server in PID 2825.
>         2015-10-15 14:16:29,533 DEBUG           misc:768  'chmod
>         644 /etc/hosts' command OK
>         2015-10-15 14:16:29,533 DEBUG         worker:558  Trying to
>         setup AMQP connection; conn = '<cm.util.comm.CMWorkerComm
>         object at 0x2743950>'
>         serving on 0.0.0.0:42284 view at http://127.0.0.1:42284
>         2015-10-15 14:17:32,656 DEBUG           comm:134  AMQP
>         Connection Failure:  [Errno 110] Connection timed out
>         2015-10-15 14:17:32,656 DEBUG         worker:558  Trying to
>         setup AMQP connection; conn = '<cm.util.comm.CMWorkerComm
>         object at 0x2743950>'
>         2015-10-15 14:18:35,760 DEBUG           comm:134  AMQP
>         Connection Failure:  [Errno 110] Connection timed out
>         2015-10-15 14:18:35,760 DEBUG         worker:558  Trying to
>         setup AMQP connection; conn = '<cm.util.comm.CMWorkerComm
>         object at 0x2743950>'
>         2015-10-15 14:19:38,864 DEBUG           comm:134  AMQP
>         Connection Failure:  [Errno 110] Connection timed out
>         2015-10-15 14:19:38,864 DEBUG         worker:558  Trying to
>         setup AMQP connection; conn = '<cm.util.comm.CMWorkerComm
>         object at 0x2743950>'
>         
>         
>         
>         
>         ubuntu@server-fbbd9a10-fb58-48d8-89cd-5ddd22821648:~$
>         cat /tmp/cm/cm_boot.py.log 
>         2015-10-15 14:23:43,713 DEBUG  cm_boot:430 - virtual-burrito
>         seems to be installed
>         2015-10-15 14:23:44,037 DEBUG  cm_boot:25  - Successfully ran
>         '/bin/bash -l -c 'VIRTUALENVWRAPPER_LOG_DIR=/tmp/;
>         HOME=/home/ubuntu; . /home/ubuntu/.venvburrito/startup.sh;
>         lsvirtualenv | grep CM''
>         2015-10-15 14:23:44,037 DEBUG  cm_boot:433 - 'CM' virtualenv
>         found
>         2015-10-15 14:23:44,049 DEBUG  cm_boot:493 - Fixing /etc/hosts
>         on NeCTAR
>         2015-10-15 14:23:44,930 INFO   cm_boot:244 - << Starting nginx
>         >>
>         2015-10-15 14:23:44,931 DEBUG  cm_boot:169 - Reconfiguring
>         nginx conf
>         2015-10-15 14:23:44,931 INFO   cm_boot:286 - Attempting to
>         configure max_client_body_size in /usr/nginx/conf/nginx.conf
>         2015-10-15 14:23:44,934 DEBUG  cm_boot:25  - Successfully ran
>         'cp /usr/nginx/conf/nginx.conf /tmp/cm/original_nginx.conf'
>         2015-10-15 14:23:44,936 DEBUG  cm_boot:25  - Successfully ran
>         'uniq /tmp/cm/original_nginx.conf
>         > /usr/nginx/conf/nginx.conf'
>         2015-10-15 14:23:44,937 DEBUG  cm_boot:25  - Successfully ran
>         'grep 'client_max_body_size' /usr/nginx/conf/nginx.conf'
>         2015-10-15 14:23:44,938 DEBUG  cm_boot:265 - Creating tmp dir
>         for nginx /mnt/galaxy/upload_store
>         2015-10-15 14:23:44,938 DEBUG  cm_boot:68  -
>         Checking /usr/local/sbin/nginx
>         2015-10-15 14:23:44,938 DEBUG  cm_boot:58
>         - /usr/local/sbin/nginx is file: False; it's executable: False
>         2015-10-15 14:23:44,938 DEBUG  cm_boot:68  -
>         Checking /usr/local/bin/nginx
>         2015-10-15 14:23:44,938 DEBUG  cm_boot:58
>         - /usr/local/bin/nginx is file: False; it's executable: False
>         2015-10-15 14:23:44,938 DEBUG  cm_boot:68  -
>         Checking /usr/bin/nginx
>         2015-10-15 14:23:44,938 DEBUG  cm_boot:58  - /usr/bin/nginx is
>         file: False; it's executable: False
>         2015-10-15 14:23:44,938 DEBUG  cm_boot:68  -
>         Checking /usr/sbin/nginx
>         2015-10-15 14:23:44,938 DEBUG  cm_boot:58  - /usr/sbin/nginx
>         is file: False; it's executable: False
>         2015-10-15 14:23:44,938 DEBUG  cm_boot:68  -
>         Checking /sbin/nginx
>         2015-10-15 14:23:44,939 DEBUG  cm_boot:58  - /sbin/nginx is
>         file: False; it's executable: False
>         2015-10-15 14:23:44,939 DEBUG  cm_boot:68  -
>         Checking /bin/nginx
>         2015-10-15 14:23:44,939 DEBUG  cm_boot:58  - /bin/nginx is
>         file: False; it's executable: False
>         2015-10-15 14:23:44,939 DEBUG  cm_boot:68  -
>         Checking /usr/sbin/nginx
>         2015-10-15 14:23:44,939 DEBUG  cm_boot:58  - /usr/sbin/nginx
>         is file: False; it's executable: False
>         2015-10-15 14:23:44,939 DEBUG  cm_boot:68  -
>         Checking /usr/nginx/sbin/nginx
>         2015-10-15 14:23:44,939 DEBUG  cm_boot:58
>         - /usr/nginx/sbin/nginx is file: True; it's executable: True
>         2015-10-15 14:23:44,939 DEBUG  cm_boot:270 - Using
>         '/usr/nginx/sbin/nginx' as the nginx executable
>         2015-10-15 14:23:44,946 ERROR  cm_boot:31  - Error running 'ps
>         xa | grep nginx | grep -v grep'. Process returned code '1' and
>         following stderr: ''
>         2015-10-15 14:23:44,964 DEBUG  cm_boot:25  - Successfully ran
>         '/usr/nginx/sbin/nginx'
>         2015-10-15 14:23:44,966 DEBUG  cm_boot:25  - Successfully ran
>         'rm -rf /mnt/galaxy/upload_store'
>         2015-10-15 14:23:44,966 DEBUG  cm_boot:281 - Deleting tmp dir
>         for nginx /mnt/galaxy/upload_store
>         2015-10-15 14:23:44,966 INFO   cm_boot:339 - << Downloading
>         CloudMan >>
>         2015-10-15 14:23:44,966 DEBUG  cm_boot:43  - Checking
>         existence of directory '/mnt/cm'
>         2015-10-15 14:23:44,966 DEBUG  cm_boot:52  - Directory
>         '/mnt/cm' exists.
>         2015-10-15 14:23:44,966 DEBUG  cm_boot:344 - Using
>         user-provided default bucket: cloudman-gvl-304
>         2015-10-15 14:23:44,966 INFO   cm_boot:324 - Connecting to a
>         custom Object Store
>         2015-10-15 14:23:44,967 DEBUG  cm_boot:333 - Got boto S3
>         connection: S3Connection:swift.rc.nectar.org.au
>         2015-10-15 14:23:44,967 DEBUG  cm_boot:210 - Checking if key
>         'cm.tar.gz' exists in bucket
>         'cm-45b53bf5024e962bd27e15fd81fcc07d'
>         2015-10-15 14:23:45,276 INFO   cm_boot:356 - CloudMan found in
>         cluster bucket 'cm-45b53bf5024e962bd27e15fd81fcc07d'.
>         2015-10-15 14:23:45,276 DEBUG  cm_boot:190 - Getting file
>         cm.tar.gz from bucket cm-45b53bf5024e962bd27e15fd81fcc07d
>         2015-10-15 14:23:45,276 DEBUG  cm_boot:194 - Attempting to
>         retrieve file 'cm.tar.gz' from bucket
>         'cm-45b53bf5024e962bd27e15fd81fcc07d'
>         2015-10-15 14:23:45,726 INFO   cm_boot:197 - Successfully
>         retrieved file 'cm.tar.gz' from bucket
>         'cm-45b53bf5024e962bd27e15fd81fcc07d' via connection
>         'swift.rc.nectar.org.au' to '/mnt/cm/cm.tar.gz'
>         2015-10-15 14:23:45,727 DEBUG  cm_boot:388 - Getting metadata
>         'revision' for file 'cm.tar.gz' from bucket
>         'cm-45b53bf5024e962bd27e15fd81fcc07d'
>         
>         
>         
>         -- 
>         Aaron E. Darling, Ph.D.
>         Associate Professor, ithree institute
>         University of Technology Sydney
>         Australia
>         
>         http://darlinglab.org
>         twitter: @koadman
>         
>         
>         
>         
>         ______________________________________________________________
>         UTS CRICOS Provider Code: 00099F DISCLAIMER: This email
>         message and any accompanying attachments may contain
>         confidential information. If you are not the intended
>         recipient, do not read, use, disseminate, distribute or copy
>         this message or attachments. If you have received this message
>         in error, please notify the sender immediately and delete this
>         message. Any views expressed in this message are those of the
>         individual sender, except where the sender expressly, and with
>         authority, states them to be the views of the University of
>         Technology Sydney. Before opening any attachments, please
>         check them for viruses and defects. Think. Green. Do. Please
>         consider the environment before printing this email. 
>         
>         
>         
>         ___________________________________________________________
>         Please keep all replies on the list by using "reply all"
>         in your mail client.  To manage your subscriptions to this
>         and other Galaxy lists, please use the interface at:
>           https://lists.galaxyproject.org/
>         
>         To search Galaxy mailing lists use the unified search at:
>           http://galaxyproject.org/search/mailinglists/
> 
> 
> 

-- 
Aaron E. Darling, Ph.D.
Associate Professor, ithree institute
University of Technology Sydney
Australia

http://darlinglab.org
twitter: @koadman




UTS CRICOS Provider Code: 00099F
DISCLAIMER: This email message and any accompanying attachments may contain 
confidential information.
If you are not the intended recipient, do not read, use, disseminate, 
distribute or copy this message or
attachments. If you have received this message in error, please notify the 
sender immediately and delete
this message. Any views expressed in this message are those of the individual 
sender, except where the
sender expressly, and with authority, states them to be the views of the 
University of Technology Sydney.
Before opening any attachments, please check them for viruses and defects.

Think. Green. Do.

Please consider the environment before printing this email.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to