We use Ansible to deploy code updates across a small fleet (~8 machines). At least a few times a week, we run into network hiccups that cause the SSH connection to a random EC2 instance to fail, causing the entire playbook run to fail. Sometimes this happens such that we are left with an incomplete deploy, which is no fun. In almost all cases we can immediately re-launch the playbook and the errant instance is fine the second time around. These appear to be very short interruptions, and there's no rhyme or reason as to which instance it effects. It's usually only one instance out of our fleet at a time (though there's no pattern as to which has connectivity issues).
What kind of strategies is everyone using to deal with these sort of sporadic SSH failures that cause the whole playbook run to fail prematurely? -- You received this message because you are subscribed to the Google Groups "Ansible Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/c91a5b9d-3cf3-4efe-93ac-17c7e7f107e8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
