Sounds like you may want to use the pause module to introduce a "sleep" in
there?

The wait_for module is frequently not enough to wait for a reboot because
SSH isn't ready (though this isn't a hang scenario, but a connection
failure), for instance.



On Sat, Nov 1, 2014 at 3:45 AM, Tony Kinsley <[email protected]> wrote:

> Today I stumbled upon what I think is an extremely rare bug with the
> paramiko ssh connection plugin. It seems that something my team added
> recently to our startup scripts has caused the ssh connection to hang when
> attempting to connect close to immediately after a reboot. I setup a
> seperate playbook that would reboot a VM and then run a debug task after
> the reboot ( plus gathering facts ). The output including all of my debug
> print statements. I could not use pdb at all with this problem as adding
> the slightest delay would make the problem go away.
>
> PLAY [vms]
> ********************************************************************
>
> TASK: [power on vsphere]
> ******************************************************
> <127.0.0.1> REMOTE_MODULE vsphere login=root password=VALUE_HIDDEN
> host=192.168.140.200
>  ### Executing Module ###
> ### LATE NEEDS TMP PATH ###
> ### NEED TMP PATH ###
> ### prior to conn.shell.mkdtemp ###
> ### post conn.shell.mkdtemp cmd = mkdir -p
> $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && chmod a+rx
> $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && echo
> $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 ###
> <127.0.0.1> EXEC ['/bin/sh', '-c', 'mkdir -p
> $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && chmod a+rx
> $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && echo
> $HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849']
> ### post low_level_exec_command result = {'stdout':
> '/home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849\n',
> 'stderr': '', 'rc': 0} ###
> ### tmp path =
> /home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/  ###
> ### remote module path
> /home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/vsphere
> ###
> ### TRANSFERING STRING ###
> <127.0.0.1> PUT /tmp/tmpYLi4mm TO
> /home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/vsphere
> <127.0.0.1> EXEC ['/bin/sh', '-c', u'LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8
> /usr/bin/env python
> /home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/vsphere']
> changed: [192.168.140.119 -> 127.0.0.1] => {"changed": true, "msg":
> "\"'vim.Task:task-113516'\" completed successfully."}
>
> TASK: [power on nova]
> *********************************************************
> skipping: [192.168.140.119]
>
> TASK: [wait for ssh]
> **********************************************************
> <127.0.0.1> REMOTE_MODULE wait_for port=22 host=192.168.140.119
>  ### Executing Module ###
> ### LATE NEEDS TMP PATH ###
> ### NEED TMP PATH ###
> ### prior to conn.shell.mkdtemp ###
> ### post conn.shell.mkdtemp cmd = mkdir -p
> $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && chmod a+rx
> $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && echo
> $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 ###
> <127.0.0.1> EXEC ['/bin/sh', '-c', 'mkdir -p
> $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && chmod a+rx
> $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && echo
> $HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606']
> ### post low_level_exec_command result = {'stdout':
> '/home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606\n',
> 'stderr': '', 'rc': 0} ###
> ### tmp path =
> /home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/  ###
> ### remote module path
> /home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/wait_for
> ###
> ### TRANSFERING STRING ###
> <127.0.0.1> PUT /tmp/tmptiaR4f TO
> /home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/wait_for
> <127.0.0.1> EXEC ['/bin/sh', '-c', u'LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8
> /usr/bin/python
> /home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/wait_for']
> ok: [192.168.140.119 -> 127.0.0.1] => {"changed": false, "elapsed": 21,
> "path": null, "port": 22, "search_regex": null, "state": "started"}
>
> PLAY [all]
> ********************************************************************
>
> GATHERING FACTS
> ***************************************************************
> <192.168.140.119> ESTABLISH CONNECTION FOR USER: root on PORT 22 TO
> 192.168.140.119
> <192.168.140.119> REMOTE_MODULE setup
>  ### Executing Module ###
> ### LATE NEEDS TMP PATH ###
> ### NEED TMP PATH ###
> ### prior to conn.shell.mkdtemp ###
> ### post conn.shell.mkdtemp cmd = mkdir -p
> $HOME/.ansible/tmp/ansible-tmp-1414825558.04-215745388264747 && echo
> $HOME/.ansible/tmp/ansible-tmp-1414825558.04-215745388264747 ###
> ### Getting channel ###
>
>
> hangs here indefinitely
>
> So the output would just hang there indefinitely. I traced this back to
> these two lines in the source code.
>
>             print '\t\t### Getting channel ###'
>             chan = self.ssh.get_transport().open_session()
>             print '\t\t### Got channel %s ###' % chan
>             self.ssh.get_transport().set_keepalive(5)
>             print '\t\t### Got transport ###'
>
> I happen to work with the engineer who submitted the original pull request
> adding that self.ssh.get_transport().set_keepalive(5) and realized what our
> problem must be. We saw this exact same issue when installing new iptables
> rules and applying them. In this case we have an upstart script applying
> our rules on boot. Recently we have been tweaking other startup scripts so
> I think we must have changed our startup order enough that the iptables
> rules are getting applied at the perfect time. I don't know I really cannot
> explain the issue any other way, but the window of time that could occur
> seems to be so tiny and I could reproduce this issue over and over again.
>
> Regardless the fix is pretty harmless, just set the keepalive before
> trying to open the session. I am going to submit a pull request and really
> wanted to explain why the request was needed.
>
> Thanks,
> Tony
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/cbb9b499-7264-4750-891e-c1860c012a64%40googlegroups.com
> <https://groups.google.com/d/msgid/ansible-project/cbb9b499-7264-4750-891e-c1860c012a64%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/CA%2BnsWgzP1a9iw5b_QJ5yLBuzr2Y03UYppz7Y_j7kb%2Bhy9QLySg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to