Today I stumbled upon what I think is an extremely rare bug with the
paramiko ssh connection plugin. It seems that something my team added
recently to our startup scripts has caused the ssh connection to hang when
attempting to connect close to immediately after a reboot. I setup a
seperate playbook that would reboot a VM and then run a debug task after
the reboot ( plus gathering facts ). The output including all of my debug
print statements. I could not use pdb at all with this problem as adding
the slightest delay would make the problem go away.
PLAY [vms]
********************************************************************
TASK: [power on vsphere]
******************************************************
<127.0.0.1> REMOTE_MODULE vsphere login=root password=VALUE_HIDDEN
host=192.168.140.200
### Executing Module ###
### LATE NEEDS TMP PATH ###
### NEED TMP PATH ###
### prior to conn.shell.mkdtemp ###
### post conn.shell.mkdtemp cmd = mkdir -p
$HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && chmod a+rx
$HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && echo
$HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 ###
<127.0.0.1> EXEC ['/bin/sh', '-c', 'mkdir -p
$HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && chmod a+rx
$HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849 && echo
$HOME/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849']
### post low_level_exec_command result = {'stdout':
'/home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849\n',
'stderr': '', 'rc': 0} ###
### tmp path =
/home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/ ###
### remote module path
/home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/vsphere
###
### TRANSFERING STRING ###
<127.0.0.1> PUT /tmp/tmpYLi4mm TO
/home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/vsphere
<127.0.0.1> EXEC ['/bin/sh', '-c', u'LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8
/usr/bin/env python
/home/akinsley/.ansible/tmp/ansible-tmp-1414825524.97-171497198370849/vsphere']
changed: [192.168.140.119 -> 127.0.0.1] => {"changed": true, "msg":
"\"'vim.Task:task-113516'\" completed successfully."}
TASK: [power on nova]
*********************************************************
skipping: [192.168.140.119]
TASK: [wait for ssh]
**********************************************************
<127.0.0.1> REMOTE_MODULE wait_for port=22 host=192.168.140.119
### Executing Module ###
### LATE NEEDS TMP PATH ###
### NEED TMP PATH ###
### prior to conn.shell.mkdtemp ###
### post conn.shell.mkdtemp cmd = mkdir -p
$HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && chmod a+rx
$HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && echo
$HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 ###
<127.0.0.1> EXEC ['/bin/sh', '-c', 'mkdir -p
$HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && chmod a+rx
$HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606 && echo
$HOME/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606']
### post low_level_exec_command result = {'stdout':
'/home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606\n',
'stderr': '', 'rc': 0} ###
### tmp path =
/home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/ ###
### remote module path
/home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/wait_for
###
### TRANSFERING STRING ###
<127.0.0.1> PUT /tmp/tmptiaR4f TO
/home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/wait_for
<127.0.0.1> EXEC ['/bin/sh', '-c', u'LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8
/usr/bin/python
/home/akinsley/.ansible/tmp/ansible-tmp-1414825535.28-82995727032606/wait_for']
ok: [192.168.140.119 -> 127.0.0.1] => {"changed": false, "elapsed": 21,
"path": null, "port": 22, "search_regex": null, "state": "started"}
PLAY [all]
********************************************************************
GATHERING FACTS
***************************************************************
<192.168.140.119> ESTABLISH CONNECTION FOR USER: root on PORT 22 TO
192.168.140.119
<192.168.140.119> REMOTE_MODULE setup
### Executing Module ###
### LATE NEEDS TMP PATH ###
### NEED TMP PATH ###
### prior to conn.shell.mkdtemp ###
### post conn.shell.mkdtemp cmd = mkdir -p
$HOME/.ansible/tmp/ansible-tmp-1414825558.04-215745388264747 && echo
$HOME/.ansible/tmp/ansible-tmp-1414825558.04-215745388264747 ###
### Getting channel ###
hangs here indefinitely
So the output would just hang there indefinitely. I traced this back to
these two lines in the source code.
print '\t\t### Getting channel ###'
chan = self.ssh.get_transport().open_session()
print '\t\t### Got channel %s ###' % chan
self.ssh.get_transport().set_keepalive(5)
print '\t\t### Got transport ###'
I happen to work with the engineer who submitted the original pull request
adding that self.ssh.get_transport().set_keepalive(5) and realized what our
problem must be. We saw this exact same issue when installing new iptables
rules and applying them. In this case we have an upstart script applying
our rules on boot. Recently we have been tweaking other startup scripts so
I think we must have changed our startup order enough that the iptables
rules are getting applied at the perfect time. I don't know I really cannot
explain the issue any other way, but the window of time that could occur
seems to be so tiny and I could reproduce this issue over and over again.
Regardless the fix is pretty harmless, just set the keepalive before trying
to open the session. I am going to submit a pull request and really wanted
to explain why the request was needed.
Thanks,
Tony
--
You received this message because you are subscribed to the Google Groups
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/cbb9b499-7264-4750-891e-c1860c012a64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.