I've been having issues running Ansible against AIX; specifically with the 
copy/template modules.  

Periodically, copy/template plays will hang; either for a long time (read 
hours, as in leave it overnight and it might be completed the next day) or 
indefinitely. After reviewing debug output for a number of these instances, 
it appears to be an issue that occurs in the sh.py code under runner.  The 
problem is in the 'checksum' function.  Below is an example debug output of 
where the copy/template module will hang:

<aix14.mgmt.loc> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o 
ControlPersist=60s -o 
ControlPath="/home/ansible/.ansible/cp/ansible-ssh-%h-%p-%r" -o Port=22 -o 
KbdInteractiveAuthentication=no -o 
PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey 
-o PasswordAuthentication=no -o ConnectTimeout=10 aix14.mgmt.loc /bin/sh -c 
'sudo -k && sudo -H -S -p "[sudo via ansible, 
key=sfknwylttinwgjiawaunhugtrjbqdymg] password: " -u root /bin/sh -c 
'"'"'echo SUDO-SUCCESS-sfknwylttinwgjiawaunhugtrjbqdymg; rc=flag; [ -r 
"/etc/ntp.conf" ] || rc=2; [ -f "/etc/ntp.conf" ] || rc=1; [ -d 
"/etc/ntp.conf" ] && rc=3; python -V 2>/dev/null || rc=4; [ x"$rc" != 
"xflag" ] && echo "${rc} /etc/ntp.conf" && exit 0; (python -c 
'"'"'"'"'"'"'"'"'import hashlib; print(hashlib.sha1(open("/etc/ntp.conf", 
"rb").read()).hexdigest())'"'"'"'"'"'"'"'"' 2>/dev/null) || (python -c 
'"'"'"'"'"'"'"'"'import sha; print(sha.sha(open("/etc/ntp.conf", 
"rb").read()).hexdigest())'"'"'"'"'"'"'"'"' 2>/dev/null) || (echo "0 
/etc/ntp.conf")'"'"''

This will happen during random copy/template plays, not necessarily for the 
same file as in the example above. The issue is reproducible, but not 
consistently; 1 in 5 runs or more may have the issue. It appears that the 
file actually copies over successfully, and then the session hangs.  If I 
run a "who -u" on the AIX host, and "kill <pid>" the pid of the SSH 
session, the playbook will continue on.  I can confirm this happens using 
SFTP, and with "scp_if_ssh = True".  It also happens with "pipelining = 
True" configured.  

After digging about on the interwebs, I have found a handful references to 
issues with the version of python included by IBM as part of the 
Linux-for-AIX toolbox.  The version we're using is 
from http://www.perzl.org/aix/, which doesn't suffer the same issues 
(see https://github.com/ansible/ansible-modules-core/issues/80).  I tried 
substituting 'hashlib.sha1' with 'hashlib._md5', and was able to reproduce 
the same hanging issue.  As part of some references online to other folks 
using Ansible to manage AIX, I've symlink'd /bin/md5sum to /bin/csum; this 
also did not fix our issues.  I can also periodically reproduce the issue 
when running a single ad-hoc ansible command using the copy module.

Below is a truss output from an AIX box where this issue occurs; this is a 
truss against the ssh process of the user connected in from Ansible.  I'm 
by no means an expert at debugging truss output, however, it appears that 
the /bin/sh is called, then it forks off a subprocess, which right away 
sends a SIGCHLD, and then the process hangs with "close(8)       
(sleeping...)".  This is where it will hang for a looooonnnnggg time.  The 
PID that gets forked off (24379542 in the example below), ends up in a 
'<defunct>' state.

kwrite(4, "\0\00304 / b i n / s h  ".., 776)    = 776
kfcntl(7, F_DUPFD, 0x00000000)                  = 9
kfcntl(7, F_DUPFD, 0x00000000)                  = 10
sigprocmask(0, 0xF02B4970, 0xF02B4978)          = 0
kfork()                                         = 24379542
thread_setmymask_fast(0x00000000, 0x00000000, 0x00000000, 0xD052A400, 
0x00000000, 0x11
960029, 0x00000000) = 0x00000000
    Received signal #20, SIGCHLD [caught]
sigprocmask(2, 0xF02B4970, 0x2FF21E80)          = 0
_sigaction(20, 0x00000000, 0x2FF21F30)          = 0
thread_setmymask_fast(0x00080000, 0x00000000, 0x00000000, 0x11960029, 
0x00000003, 0x00
000000, 0x00000000) = 0x00000000
kwrite(6, "\0", 1)                              = 1
ksetcontext_sigreturn(0x2FF21FE0, 0x2FF22FF8, 0x2002D0D0, 0x0000D032, 
0x00000003, 0x00
000000, 0x00000000)
close(8)                        (sleeping...)

In the interest of disclosing all information, I also notice weird behavior 
with the 'w' command when trying to determine if Ansible has an SSH session 
open on a host where a playbook is hanging.  The 'w' command will hang for 
a few seconds when it hits the user logged in and running the Ansible 
playbook.  When I run a truss against the 'w' command, I get the output 
below.  the command is getting the status of the user's pts, then it gets a 
SIGALRM, which apparently means the system call is taking too long to 
respond:

kopen("/dev/pts/4", O_RDONLY|O_NONBLOCK) (sleeping...)
kopen("/dev/pts/4", O_RDONLY|O_NONBLOCK)        Err#4  EINTR
    Received signal #14, SIGALRM [caught]
_sigaction(14, 0x0FFFFFFFFFFFEEB0, 0x0FFFFFFFFFFFEEE0) = 0
ksetcontext_sigreturn(0x0FFFFFFFFFFFF000, 0x0000000000000000, 
0x0FFFFFFFFFFFFFE8, 0x800000000000D032, 0x3FFC000000000003, 
0x00000000000000E8, 0x0000000000000000, 0x0000000000000000)
statx("/dev/pts/4", 0x0FFFFFFFFFFFF618, 176, 0) = 0
incinterval(0, 0x0FFFFFFFFFFFF4F8, 0x0FFFFFFFFFFFF518) = 0
statx("/dev/pts", 0x0FFFFFFFFFFFF618, 176, 0)   = 0
statx("/dev/pts/4", 0x0FFFFFFFFFFFF638, 176, 0) = 0
ansible  pts/4       03:15PM         36         0         0 -
kwrite(1, " a n s i b l e     p t s".., 62)     = 62
kread(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 4096)    = 1136
_sigaction(14, 0x0FFFFFFFFFFFF4F0, 0x0FFFFFFFFFFFF520) = 0
incinterval(0, 0x0FFFFFFFFFFFF4F8, 0x0FFFFFFFFFFFF518) = 0

My environment is as follows:

Ubuntu 12.04
Ansible 1.8.2 (installed from the Ansible PPA)
AIX 7.1 (have reproduced for sure on TL2SP4, and TL1SP0)
python 2.7.5

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/19b02ae3-baeb-4c43-a9dc-a80b8454bf32%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to