Hi.
I have a small webhook app that is kicked of by curl and runs
ansible-playbook. I have noticed some weirdness where my webhook would show
the Ansible run as completed successfully, but curl would return 500 or
504. I have tried my best to debug, but the furthest I could get is
isolating the check_call running ansible-playbook as the problem.
My coworker noticed defunct ssh processes on the same machine and they seem
to coincide with ansible-playbook runs. They only go away after restarting
the webhook app. Since I can't seem to figure out why curl is failing with
500/504, I thought I'd trying and solve the defunct ssh problem in the
hopes it is related.
Here is the ansible-playbook call:
json_data = request.get_json(force=True)
try:
app_name = json_data['app_name']
app_env = json_data['app_env']
except KeyError:
return 'Please specify app_name and app_envs', 400
play = '%s_%s' % (app_name, app_env)
inventory = 'inventory/%s' % play
tag = 'deploy'
try:
check_call(["ansible-playbook", "-i", inventory,
"infra_{}.yml".format(play), "--tags", tag],
cwd=workspace)
except Exception as e:
logger.exception(e)
logger.info(datetime.now())
return 'Failure. See logs for error.', 500
else:
logger.info(datetime.now())
return 'Success!', 200
It seems that some playbooks result in defunct ssh processes and some
don't. I can't seem to figure out a difference between the playbooks that
involve ssh as they are all just running docker containers. This is what I
find immediately after a run that succeeds, but curl fails with 500/504:
$ ps -ef | grep ssh
root 13925 1 0 Mar07 ? 00:00:14 /usr/sbin/sshd -D
root 17119 13925 0 18:58 ? 00:00:00 sshd: mmorris [priv]
mmorris 17163 17119 0 18:58 ? 00:00:00 sshd: mmorris@pts/1
root 17345 17243 0 19:07 ? 00:00:00 [ssh] <defunct>
root 17346 17243 0 19:07 ? 00:00:00 ssh: /root/.ansible/cp/
ansible-ssh-52.4.115.46-22-root [mux]
root 17478 13925 0 19:09 ? 00:00:00 sshd: mmorris [priv]
mmorris 17521 17478 0 19:09 ? 00:00:00 sshd: mmorris@pts/3
And then after less than 30 seconds, the ansible related process also turns
defunct:
$ ps -ef | grep ssh
root 13925 1 0 Mar07 ? 00:00:14 /usr/sbin/sshd -D
root 17119 13925 0 18:58 ? 00:00:00 sshd: mmorris [priv]
mmorris 17163 17119 0 18:58 ? 00:00:00 sshd: mmorris@pts/1
root 17345 17243 0 19:07 ? 00:00:00 [ssh] <defunct>
root 17346 17243 0 19:07 ? 00:00:00 [ssh] <defunct>
root 17478 13925 0 19:09 ? 00:00:00 sshd: mmorris [priv]
mmorris 17521 17478 0 19:09 ? 00:00:00 sshd: mmorris@pts/3
This has been causing me headache for a while now as I have CI/CD runs
failing even though the deploy itself with Ansible is successful. Any
information or advice for figuring this out would be VERY much appreciated!
Maybe there is something I can do instead of just check_call so that
whatever is going on with the ssh processes won't effect the exit code
passed to the app?
--
You received this message because you are subscribed to the Google Groups
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/8424c218-2c9b-431c-aacc-c6a2d6d5e6fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.