[ansible-project] Re: running Ansible-playbook through python check_call results in defunct ssh processes

Marcus Morris Sun, 01 May 2016 13:07:58 -0700

Here is what the curl request looks like:

curl -i -f -u "$DEPLOY_USER:$DEPLOY_PASS" -X POST https://deploy.company.com 
-d '{"app_name":"api","app_env":"'"$ENV"'"}'
curl: (22) The requested URL returned error: 504


and the webhook logs:

[DEPRECATION WARNING]: Instead of sudo/sudo_user, use become/become_user and
make sure become_method is 'sudo' (default). This feature will be removed 
in a
future release. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.

PLAY [Configure instance(s)] 
***************************************************

TASK [setup] 
*******************************************************************
ok: [52.4.115.46]

TASK [api : create api non-prod test container] ************************
changed: [52.4.115.46]

TASK [api : pause] *********************************************************
Pausing for 5 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [52.4.115.46]

TASK [api : run test] ******************************************************
ok: [52.4.115.46]

TASK [api : create api non-prod containers] ****************************
changed: [52.4.115.46]

RUNNING HANDLER [api : stop test container] ********************************
changed: [52.4.115.46]

RUNNING HANDLER [api : remove test container] ******************************
changed: [52.4.115.46]

RUNNING HANDLER [api : DEL redis keys] *************************************
changed: [52.4.115.46]

PLAY RECAP 
*********************************************************************
52.4.115.46                : ok=26   changed=5    unreachable=0    failed=0

INFO:waitress:2016-05-01 19:44:48.458123

Running *Ansible 2.0.0.2*

On Sunday, May 1, 2016 at 2:18:55 PM UTC-5, Marcus Morris wrote:
>
> Hi.
>
> I have a small webhook app that is kicked of by curl and runs 
> ansible-playbook. I have noticed some weirdness where my webhook would show 
> the Ansible run as completed successfully, but curl would return 500 or 
> 504. I have tried my best to debug, but the furthest I could get is 
> isolating the check_call running ansible-playbook as the problem.
>
> My coworker noticed defunct ssh processes on the same machine and they 
> seem to coincide with ansible-playbook runs. They only go away after 
> restarting the webhook app. Since I can't seem to figure out why curl is 
> failing with 500/504, I thought I'd trying and solve the defunct ssh 
> problem in the hopes it is related.
>
> Here is the ansible-playbook call:
>
>     json_data = request.get_json(force=True)
>
>     try:
>         app_name = json_data['app_name']
>         app_env = json_data['app_env']
>     except KeyError:
>         return 'Please specify app_name and app_envs', 400
>
>     play = '%s_%s' % (app_name, app_env)
>     inventory = 'inventory/%s' % play
>     tag = 'deploy'
>
>     try:
>         check_call(["ansible-playbook", "-i", inventory, 
> "infra_{}.yml".format(play), "--tags", tag],
>  cwd=workspace)
>     except Exception as e:
>         logger.exception(e)
>         logger.info(datetime.now())
>         return 'Failure. See logs for error.', 500
>     else:
>         logger.info(datetime.now())
>         return 'Success!', 200
>
> It seems that some playbooks result in defunct ssh processes and some 
> don't. I can't seem to figure out a difference between the playbooks that 
> involve ssh as they are all just running docker containers. This is what I 
> find immediately after a run that succeeds, but curl fails with 500/504:
>
> $ ps -ef | grep ssh
>
> root     13925     1  0 Mar07 ?        00:00:14 /usr/sbin/sshd -D
> root     17119 13925  0 18:58 ?        00:00:00 sshd: mmorris [priv]
> mmorris  17163 17119  0 18:58 ?        00:00:00 sshd: mmorris@pts/1
> root     17345 17243  0 19:07 ?        00:00:00 [ssh] <defunct>
> root     17346 17243  0 19:07 ?        00:00:00 ssh: /root/.ansible/cp/
> ansible-ssh-52.4.115.46-22-root [mux]
> root     17478 13925  0 19:09 ?        00:00:00 sshd: mmorris [priv]
> mmorris  17521 17478  0 19:09 ?        00:00:00 sshd: mmorris@pts/3
>
> And then after less than 30 seconds, the ansible related process also 
> turns defunct:
>
> $ ps -ef | grep ssh
> root     13925     1  0 Mar07 ?        00:00:14 /usr/sbin/sshd -D
> root     17119 13925  0 18:58 ?        00:00:00 sshd: mmorris [priv]
> mmorris  17163 17119  0 18:58 ?        00:00:00 sshd: mmorris@pts/1
> root     17345 17243  0 19:07 ?        00:00:00 [ssh] <defunct>
> root     17346 17243  0 19:07 ?        00:00:00 [ssh] <defunct>
> root     17478 13925  0 19:09 ?        00:00:00 sshd: mmorris [priv]
> mmorris  17521 17478  0 19:09 ?        00:00:00 sshd: mmorris@pts/3
>
> This has been causing me headache for a while now as I have CI/CD runs 
> failing even though the deploy itself with Ansible is successful. Any 
> information or advice for figuring this out would be VERY much appreciated!
>
> Maybe there is something I can do instead of just check_call so that 
> whatever is going on with the ssh processes won't effect the exit code 
> passed to the app?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/0ddd2ede-c969-48ec-9eb3-b6c82f127655%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[ansible-project] Re: running Ansible-playbook through python check_call results in defunct ssh processes

Reply via email to