I don't know if retry works well with serial. On Mon, Aug 10, 2015 at 4:11 PM, Ian Rose <[email protected]> wrote: > My understanding of retry files (which could certainly be wrong) is that > they merely limit the hosts that are included in the run. Which I don't > think will work for me, although perhaps this indicates that my playbook is > not set up well. Here is a simplified version of my site.yml: > > - name: copy new files to all nodes > hosts: all > tasks: > - include: tasks/deploy_files.yml > > - name: configure and deploy backend type foo > hosts: tag_foo > roles: > - foo > > - name: configure and deploy backend type bar > hosts: tag_bar > roles: > - bar > > - name: configure and deploy backend type baz > hosts: tag_baz > roles: > - baz > > (etc, for 7 total backend types) > > - name: clean up old deployments from all nodes > hosts: all > tasks: > - include: tasks/remove_old_deployments.yml > > > So, given this structure, pretend that the "foo" step went fine, but then > some step during one of the "bar" backend deployments failed. Won't the > retry file just contain that single host? (assuming we are running "serial: > 1" for that task that failed) So if I reran using that file, I might get > that "bar" host to deploy correctly, but I will totally miss all of the > "baz" hosts and all other backends whose deployment tasks appear after the > "bar" task. > > I suppose one option might be to break up this single site.yml into 7 > different playbooks, one for each backend type, and then execute them each > in order, retrying each one as necessary if any errors occur. Would that be > a better setup? That seems to be a bit silly, but maybe I'm wrong on > that... > > Thanks, > Ian > > > > On Monday, August 10, 2015 at 3:37:32 PM UTC-4, Brian Coca wrote: >> >> You can use the .retry files as a --limit to rerun the plays. >> >> On Mon, Aug 10, 2015 at 3:29 PM, Ian Rose <[email protected]> wrote: >> > Hi all - >> > >> > I've been pretty happy running Ansible for a few months now. The one >> > major >> > thorn in my side is failed tasks. Our fleet of VMs is not very large, >> > but >> > apparently is large enough (or our playbook is long enough) that we hit >> > at >> > least one spurious SSH error (e.g. "SSH Error: >> > mux_client_hello_exchange: >> > write packet: Broken pipe"), or, more rarely, I'll hit a spurious 500 >> > from a >> > third party service (e.g. adding or removing our VMs to/from load >> > balancers >> > via a cloud API). >> > >> > What's the best practice for dealing with these kinds of transient >> > failures? >> > It seems like me that something like "sleep X seconds, then retry, up to >> > Y >> > times" would work quite well, but it isn't obvious to me how to make >> > that >> > happen. >> > >> > I'm aware of the wait_for module, but I don't think that really helps in >> > this situation since the problem isn't that a resource is actually >> > missing; >> > its just spurious failures. >> > >> > Any suggestions? >> > >> > Thanks! >> > - Ian >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "Ansible Project" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> > an >> > email to [email protected]. >> > To post to this group, send email to [email protected]. >> > To view this discussion on the web visit >> > >> > https://groups.google.com/d/msgid/ansible-project/e47c3c8a-817f-4933-b429-492a430b277f%40googlegroups.com. >> > For more options, visit https://groups.google.com/d/optout. >> >> >> >> -- >> Brian Coca > > -- > You received this message because you are subscribed to the Google Groups > "Ansible Project" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/ansible-project/7774062b-763e-4e37-9488-b0e8ff081198%40googlegroups.com. > > For more options, visit https://groups.google.com/d/optout.
-- Brian Coca -- You received this message because you are subscribed to the Google Groups "Ansible Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/CAJ5XC8%3D3WWb3dc2a2XaOc3%3DQOiHwcJmBtWwC4KtPKqcNADJvUQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
