Sounds like you may want to use the pause module to introduce a "sleep" in there?
The wait_for module is frequently not enough to wait for a reboot because SSH isn't ready (though this isn't a hang scenario, but a connection failure), for instance. On Sat, Nov 1, 2014 at 3:45 AM, Tony Kinsley <[email protected]> wrote: > Today I stumbled upon what I think is an extremely rare bug with the > paramiko ssh connection plugin. It seems that something my team added > recently to our startup scripts has caused the ssh connection to hang when > attempting to connect close to immediately after a reboot. I setup a > seperate playbook that would reboot a VM and then run a debug task after > the reboot ( plus gathering facts ). The output including all of my debug > print statements. I could not use pdb at all with this problem as adding > the slightest delay would make the problem go away. > > PLAY [vms] > ******************************************************************** > > TASK: [power on vsphere] > ****************************************************** > <127.0.0.1> REMOTE_MODULE vsphere login=root password=VALUE_HIDDEN > host=192.168.140.200 > ### Executing Module ### > ### LATE NEEDS TMP PATH ### > ### NEED TMP PATH ### > ### prior to conn.shell.mkdtemp ### > ### post conn.shell.mkdtemp cmd = mkdir -p > $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && chmod a+rx > $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && echo > $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 ### > <127.0.0.1> EXEC ['/bin/sh', '-c', 'mkdir -p > $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && chmod a+rx > $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && echo > $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849'] > ### post low_level_exec_command result = {'stdout': > '/home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849\n', > 'stderr': '', 'rc': 0} ### > ### tmp path = > /home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/ ### > ### remote module path > /home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/vsphere > ### > ### TRANSFERING STRING ### > <127.0.0.1> PUT /tmp/tmpYLi4mm TO > /home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/vsphere > <127.0.0.1> EXEC ['/bin/sh', '-c', u'LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 > /usr/bin/env python > /home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/vsphere'] > changed: [192.168.140.119 -> 127.0.0.1] => {"changed": true, "msg": > "\"'vim.Task:task-113516'\" completed successfully."} > > TASK: [power on nova] > ********************************************************* > skipping: [192.168.140.119] > > TASK: [wait for ssh] > ********************************************************** > <127.0.0.1> REMOTE_MODULE wait_for port=22 host=192.168.140.119 > ### Executing Module ### > ### LATE NEEDS TMP PATH ### > ### NEED TMP PATH ### > ### prior to conn.shell.mkdtemp ### > ### post conn.shell.mkdtemp cmd = mkdir -p > $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && chmod a+rx > $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && echo > $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 ### > <127.0.0.1> EXEC ['/bin/sh', '-c', 'mkdir -p > $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && chmod a+rx > $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && echo > $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606'] > ### post low_level_exec_command result = {'stdout': > '/home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606\n', > 'stderr': '', 'rc': 0} ### > ### tmp path = > /home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/ ### > ### remote module path > /home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/wait_for > ### > ### TRANSFERING STRING ### > <127.0.0.1> PUT /tmp/tmptiaR4f TO > /home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/wait_for > <127.0.0.1> EXEC ['/bin/sh', '-c', u'LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 > /usr/bin/python > /home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/wait_for'] > ok: [192.168.140.119 -> 127.0.0.1] => {"changed": false, "elapsed": 21, > "path": null, "port": 22, "search_regex": null, "state": "started"} > > PLAY [all] > ******************************************************************** > > GATHERING FACTS > *************************************************************** > <192.168.140.119> ESTABLISH CONNECTION FOR USER: root on PORT 22 TO > 192.168.140.119 > <192.168.140.119> REMOTE_MODULE setup > ### Executing Module ### > ### LATE NEEDS TMP PATH ### > ### NEED TMP PATH ### > ### prior to conn.shell.mkdtemp ### > ### post conn.shell.mkdtemp cmd = mkdir -p > $HOME/.ansible/tmp/ansible-tmp-1414825558.04-215745388264747 && echo > $HOME/.ansible/tmp/ansible-tmp-1414825558.04-215745388264747 ### > ### Getting channel ### > > > hangs here indefinitely > > So the output would just hang there indefinitely. I traced this back to > these two lines in the source code. > > print '\t\t### Getting channel ###' > chan = self.ssh.get_transport().open_session() > print '\t\t### Got channel %s ###' % chan > self.ssh.get_transport().set_keepalive(5) > print '\t\t### Got transport ###' > > I happen to work with the engineer who submitted the original pull request > adding that self.ssh.get_transport().set_keepalive(5) and realized what our > problem must be. We saw this exact same issue when installing new iptables > rules and applying them. In this case we have an upstart script applying > our rules on boot. Recently we have been tweaking other startup scripts so > I think we must have changed our startup order enough that the iptables > rules are getting applied at the perfect time. I don't know I really cannot > explain the issue any other way, but the window of time that could occur > seems to be so tiny and I could reproduce this issue over and over again. > > Regardless the fix is pretty harmless, just set the keepalive before > trying to open the session. I am going to submit a pull request and really > wanted to explain why the request was needed. > > Thanks, > Tony > > > > -- > You received this message because you are subscribed to the Google Groups > "Ansible Project" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/ansible-project/cbb9b499-7264-4750-891e-c1860c012a64%40googlegroups.com > <https://groups.google.com/d/msgid/ansible-project/cbb9b499-7264-4750-891e-c1860c012a64%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Ansible Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/CA%2BnsWgzP1a9iw5b_QJ5yLBuzr2Y03UYppz7Y_j7kb%2Bhy9QLySg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
