The short version; I'm running tests against Fedora 22 Alpha Atomic host and Server, and am seeing various connection failures depending on ssh_connection settings. They all look like something reported against openssh-5.3 with the ControlPersist backport. I played around with various values and found a workaround, remove the ControlPersist value:
[ssh_connection] ssh_args = -o ControlMaster=auto in ~/.ansible.cfg allows everything to work. The question is why? The long version: Looking up the connection issues (SFTP failures, plain connection resets, etc) I came across this post from last year https://groups.google.com/forum/#!msg/ansible-project/QUdxNK1zEH0/rQKnO827FUgJ which had similar looking failures and lead to this RHT Bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1160487. I went through the various options mentioned in the thread in a local ansible.cfg, disabling pipeline, changing to scp, clearing ssh_args entirely. The last option worked, so I dug further. The ansible version is: [fedora@atomic-master ~]$ rpm -q ansible ansible-1.8.4-1.fc22.noarch OpenSSH versions on each system: 192.168.122.10 | success | rc=0 >> openssh-6.7p1-11.fc22.x86_64 192.168.122.11 | success | rc=0 >> openssh-6.7p1-10.fc22.x86_64 No user based ssh settings, all values default from /etc/ansible: (p1-11 succeeds, p1-10 fails) <192.168.122.10> PubkeyAuthentication=no ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s <192.168.122.10> fatal: [192.168.122.10] => failed to transfer file to /home/fedora/.ansible/tmp/ansible-tmp-1427209312.33-30225738744475/setup: Couldn't read packet: Connection reset by peer <192.168.122.11> PubkeyAuthentication=no ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s ok: [192.168.122.11] User based ~/.ansible.cfg, Pipelining enabled (p1-11 succeeds, p1-10 fails) [defaults] host_key_checking = False [ssh_connection] #ssh_args = -o ControlMaster=auto #ssh_args = pipelining = True <192.168.122.10> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=irpxmyfkxjqtjkyqbfmyvogzyeygovsm] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-irpxmyfkxjqtjkyqbfmyvogzyeygovsm; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s fatal: [192.168.122.10] => ssh connection error waiting for sudo or su password prompt <192.168.122.11> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=hdearqulmxcbkpjxjlgwpyiiapebebju] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-hdearqulmxcbkpjxjlgwpyiiapebebju; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no ControlPath=/home/fedora/.ansible/cp/ansible-ssh-%h-%p-%r StrictHostKeyChecking=no ControlMaster=auto ControlPersist=60s ok: [192.168.122.11] User based ~/.ansible.cfg, Pipelining with ControlPersist removed (p1-11 succeeds, p1-10 fails) [defaults] host_key_checking = False [ssh_connection] ssh_args = -o ControlMaster=auto #ssh_args = pipelining = True <192.168.122.10> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=gbcczsohsqeiuyzxezohqbikibenkeun] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-gbcczsohsqeiuyzxezohqbikibenkeun; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto fatal: [192.168.122.10] => ssh connection error waiting for sudo or su password prompt <192.168.122.11> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=xjcccpqinmchipiubhsmjwdjvgmytbww] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-xjcccpqinmchipiubhsmjwdjvgmytbww; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto ok: [192.168.122.11] User based ~/.ansible.cfg, Remove ControlPersist and pipelining (p1-11 succeeds, p1-10 succeeds) [defaults] host_key_checking = False [ssh_connection] ssh_args = -o ControlMaster=auto #ssh_args = #pipelining = True <192.168.122.10> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=dyqtvojqcsjschscszupwxnithfwjhqs] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-dyqtvojqcsjschscszupwxnithfwjhqs; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/fedora/.ansible/tmp/ansible-tmp-1427210773.45-273681836317736/setup; rm -rf /home/fedora/.ansible/tmp/ansible-tmp-1427210773.45-273681836317736/ >/dev/null 2>&1'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto ok: [192.168.122.10] <192.168.122.11> PubkeyAuthentication=no 'sudo -k && sudo -H -S -p "[sudo via ansible, key=gixgtyagbeoctcldxprcqrjwuxdhscvr] password: " -u root /bin/sh -c '"'"'echo SUDO-SUCCESS-gixgtyagbeoctcldxprcqrjwuxdhscvr; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/fedora/.ansible/tmp/ansible-tmp-1427210773.47-126330968681284/setup; rm -rf /home/fedora/.ansible/tmp/ansible-tmp-1427210773.47-126330968681284/ >/dev/null 2>&1'"'"'' ConnectTimeout=10 GSSAPIAuthentication=no StrictHostKeyChecking=no ControlMaster=auto ok: [192.168.122.11] Manually testing the ControlPersist values on the command line works as expected for both, will time out after 60s [fedora@atomic-master ansible-atomic]$ ssh -F /dev/null -o ControlMaster=auto -o ControlPersist=60s -S Test_Master_Socket [email protected] echo Hello [email protected]'s password: Hello [fedora@atomic-master ansible-atomic]$ ps -fu `whoami` | grep "[s]sh.*Test_Master_Socket" fedora 1015 1 0 11:29 ? 00:00:00 ssh: Test_Master_Socket [mux] [fedora@atomic-master ansible-atomic]$ ssh -F /dev/null -S Test_Master_Socket -O check 192.168.122.11 Master running (pid=1015) I looked at the Koji page for openssh and don't see anything particular to ControlPersist in the change log, but I'm not an OpenSSH ControlPersist guru. http://koji.fedoraproject.org/koji/buildinfo?buildID=619696 Bottom line, I'm out of troubleshooting steps, not sure what the impact of the workaround is, and I think someone who has more depth should take a look. I'm cc'ing ansible-devel because I wasn't sure what the right forum for this sort of issue was. Hopefully this was clear! Cheers, -Matt M -- You received this message because you are subscribed to the Google Groups "Ansible Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/c345bfa1-021e-4bb0-8c59-978841a240fb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
