Re: [ansible-project] Retrying failed tasks

Brian Coca Mon, 10 Aug 2015 13:40:37 -0700

I don't know if retry works well with serial.

On Mon, Aug 10, 2015 at 4:11 PM, Ian Rose <[email protected]> wrote:
> My understanding of retry files (which could certainly be wrong) is that
> they merely limit the hosts that are included in the run.  Which I don't
> think will work for me, although perhaps this indicates that my playbook is
> not set up well.  Here is a simplified version of my site.yml:
>
> - name: copy new files to all nodes
>   hosts: all
>   tasks:
>   - include: tasks/deploy_files.yml
>
> - name: configure and deploy backend type foo
>   hosts: tag_foo
>   roles:
>     - foo
>
> - name: configure and deploy backend type bar
>   hosts: tag_bar
>   roles:
>   - bar
>
> - name: configure and deploy backend type baz
>   hosts: tag_baz
>   roles:
>   - baz
>
> (etc, for 7 total backend types)
>
> - name: clean up old deployments from all nodes
>   hosts: all
>   tasks:
>   - include: tasks/remove_old_deployments.yml
>
>
> So, given this structure, pretend that the "foo" step went fine, but then
> some step during one of the "bar" backend deployments failed.  Won't the
> retry file just contain that single host?  (assuming we are running "serial:
> 1" for that task that failed)  So if I reran using that file, I might get
> that "bar" host to deploy correctly, but I will totally miss all of the
> "baz" hosts and all other backends whose deployment tasks appear after the
> "bar" task.
>
> I suppose one option might be to break up this single site.yml into 7
> different playbooks, one for each backend type, and then execute them each
> in order, retrying each one as necessary if any errors occur.  Would that be
> a better setup?  That seems to be a bit silly, but maybe I'm wrong on
> that...
>
> Thanks,
> Ian
>
>
>
> On Monday, August 10, 2015 at 3:37:32 PM UTC-4, Brian Coca wrote:
>>
>> You can use the .retry files as a --limit to rerun the plays.
>>
>> On Mon, Aug 10, 2015 at 3:29 PM, Ian Rose <[email protected]> wrote:
>> > Hi all -
>> >
>> > I've been pretty happy running Ansible for a few months now.  The one
>> > major
>> > thorn in my side is failed tasks.  Our fleet of VMs is not very large,
>> > but
>> > apparently is large enough (or our playbook is long enough) that we hit
>> > at
>> > least one spurious SSH error (e.g. "SSH Error:
>> > mux_client_hello_exchange:
>> > write packet: Broken pipe"), or, more rarely, I'll hit a spurious 500
>> > from a
>> > third party service (e.g. adding or removing our VMs to/from load
>> > balancers
>> > via a cloud API).
>> >
>> > What's the best practice for dealing with these kinds of transient
>> > failures?
>> > It seems like me that something like "sleep X seconds, then retry, up to
>> > Y
>> > times" would work quite well, but it isn't obvious to me how to make
>> > that
>> > happen.
>> >
>> > I'm aware of the wait_for module, but I don't think that really helps in
>> > this situation since the problem isn't that a resource is actually
>> > missing;
>> > its just spurious failures.
>> >
>> > Any suggestions?
>> >
>> > Thanks!
>> > - Ian
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Ansible Project" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to [email protected].
>> > To post to this group, send email to [email protected].
>> > To view this discussion on the web visit
>> >
>> > https://groups.google.com/d/msgid/ansible-project/e47c3c8a-817f-4933-b429-492a430b277f%40googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> Brian Coca
>
> --
> You received this message because you are subscribed to the Google Groups
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/7774062b-763e-4e37-9488-b0e8ff081198%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.




-- 
Brian Coca

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/CAJ5XC8%3D3WWb3dc2a2XaOc3%3DQOiHwcJmBtWwC4KtPKqcNADJvUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ansible-project] Retrying failed tasks

Reply via email to