Several things going on.
#0: Your post/email is dated May 9, but I didn't see this until Tuesday
May 14. This is not related to what you were asking about, but the irony
of a "timeout" question taking ~5 days to land is too delicious not to
mention.
#1: I have no idea what your ssh_timeout_wait_for variable might be set
to. Maybe it isn't set, so the default(5) may be kicking in. I don't
think it matters in any case, though, because of #2.
#2: Of all the possible modules you could invoke for your "Check host
reachability" task, wait_for_connection is perhaps the most opposite of
what you're trying to accomplish. The whole point of that module assumes
the host in question is actually down - probably because you just
rebooted it in a prior task - and you want to wait for it to come back
up before proceeding. And that's what it's doing: waiting until the host
comes back up or until the end of time (which, fortunately, comes sooner
in the lifespan of this task than it does for us out here in the real
world). Change this to ansible.builtin.ping (or almost anything else) to
get the behavior you seek.
#3: "timeout" is one of the most horribly encapsulated concepts in
Ansible (and a lot of other software). It's used to describe aspects of
- connections,
- running times for
- tasks
- plays
- playbooks
- workflows
- pauses
- async task management
- before iterations
- between iterations
- plus whatever crazy foo any particular plugin might want to do with
a bit of spare time
- …
I say "encapsulated concepts" because, while all those things are
mentioned somewhere in the docs, there's no single place you can look
and see them all laid out side by side, compared and contrasted where
the interplay between them all is discussed. To be fair, none of those
specific docs where some timeout is discussed should be the canonical
home of such an overview. That "General Discussion of All Things Timing"
page is yet to be written.
To get a feel for ping vs wait_for_connection, consider this snippet of
bash script. You'll need to substitute actual host names for
"reachable.host" and "unreachable.host". The tl;dr (too long, didn't
run) upshot is: you don't want to use wait_for_connection as a
reachability test.
for module in ansible.builtin.ping ansible.builtin.wait_for_connection ; do
for ct in 2 20 ; do
printf "module: %s with connection_timeout: %d\n" $module $ct
time ansible all -i reachable.host,unreachable.host, -m $module -e
connection_timeout=$ct -v
done
done
On 5/9/24 7:56 AM, Ismail Ett wrote:
Hey everyone,
I have multiple playbooks that runs on a schedule on lots of hosts,
some are sometimes turned off for cost saving.
Almost all jobs on AWX are marked as failed because there is at least
1 host that is powered off. Which is not very aesthetically pleasing
and also hard to know when a job has actually failed on an important
task on a host.
Another inconvenience is that the jobs take a lot of time to execute
when there are lots of hosts that are unreachable, because ansible
hangs on them and waits for the connection. I tried decreasing the
timeout settings in our ansible.cfg to 20 seconds which did help a bit
but the hanging on (turned off) hosts take a lot of waiting before the
tasks carry on with the other hosts.
The solution for me was:
- Add a pre_task on each playbook that will run a wait_for_connection task
- Check if it fails then i end the task without proceeding like so
```
- hosts: all
gather_facts: no
pre_tasks:
- name: Check host reachability
wait_for_connection:
timeout: "{{ ssh_timeout_wait_for | default(5) }}"
sleep: 1
ignore_errors: true
ignore_unreachable: true
register: host_is_reachable
- name: End play if host is unreachable
meta: end_play
when: host_is_reachable.failed
roles:
- role: roles/somerole
```
This seems to fix my first problem of jobs been marked as failed if
one host is unreachable.
But it does not fix my second problem which is ansible hanging on the
unreachable hosts for so long.
In the the wait_for_connection i have set the timeout to 5 seconds,
expecting that the ansible should try and reach the host but if it
fails to do so in 5 seconds it should end the play. But it doe not do
that.
Instead ansible hangs on the unreachable host for more than 2 minutes
throws a warning like this:
WARNING]: Unhandled error in Python interpreter discovery for host
172.12.23.34: Failed to connect to the host via ssh: ssh: connect to host
And then waits some extra time and then the output of the
wait_for_connection task gets printed like so:
TASK [Check host reachability]
*************************************************
fatal: [172.12.23.34]: FAILED! => {"changed": false,
"elapsed": 169, "msg": "timed out waiting for ping module test:
Data could not be sent to remote host \"172.12.23.34\".
Make sure this host can be reached over ssh: ssh:
connect to host 172.12.23.34 port 22: Connection timed out\r\n"}
...ignoring
As you can see in the task output the wait_for_connection alone waited
for 169 seconds even after specifying a way lower value.
Am i doing something wrong? Is this the default behavior?
Extra questions:
- Is this because ansible tries to facts gather before even starting
the wait_for task? that was the reason i put the wait_for_connection
in a pre_task.
- Is the 169 seconds not random and it has to do with the default
timeout ssh settings? i get different values every time i run the
playbook so i don't think so.
- Please share with me any alternative approach to fix to first 2
problems.
Any help would be appreciated :)
--
You received this message because you are subscribed to the Google
Groups "Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to ansible-project+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/09688194-d7f2-4d89-9935-d7b8c326dd6cn%40googlegroups.com
<https://groups.google.com/d/msgid/ansible-project/09688194-d7f2-4d89-9935-d7b8c326dd6cn%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
Todd
--
You received this message because you are subscribed to the Google Groups "Ansible
Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to ansible-project+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/27b1109a-2851-45db-953b-16c08bab9774%40gmail.com.