Hi all -

I've been pretty happy running Ansible for a few months now.  The one major 
thorn in my side is failed tasks.  Our fleet of VMs is not very large, but 
apparently is large enough (or our playbook is long enough) that we hit at 
least one spurious SSH error (e.g. "SSH Error: mux_client_hello_exchange: 
write packet: Broken pipe"), or, more rarely, I'll hit a spurious 500 from 
a third party service (e.g. adding or removing our VMs to/from load 
balancers via a cloud API).

What's the best practice for dealing with these kinds of transient 
failures?  It seems like me that something like "sleep X seconds, then 
retry, up to Y times" would work quite well, but it isn't obvious to me how 
to make that happen.

I'm aware of the wait_for module, but I don't think that really helps in 
this situation since the problem isn't that a resource is actually missing; 
its just spurious failures.

Any suggestions?

Thanks!
- Ian

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/e47c3c8a-817f-4933-b429-492a430b277f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to