Re: [ansible-project] On replaying notifications after failures

Petros Moisiadis Wed, 19 Mar 2014 03:30:27 -0700

On 03/18/2014 04:42 PM, Michael DeHaan wrote:
> I wouldn't want the system to generate two seperate files, but it
> could generate a new system for retries.
>
> We can think about this.
>
>
>
>
> On Tue, Mar 18, 2014 at 10:40 AM, Petros Moisiadis <[email protected]
> <mailto:[email protected]>> wrote:
>
>     On 03/18/14 16:20, Michael DeHaan wrote:
>>     "Being it a command line option does not help, because you do not
>>     know before running the command that any of your tasks is going
>>     to fail and how."
>>
>>     Incorrect, because you would use this when running the retry
>>     command only.
>>
>     Well, it did not occur to me that you could actually use that
>     option _after_ the failure. :-[
>     However, I think that a more controllable tool to run handlers
>     selectively could be more powerful for recovering from unexpected
>     deployment failures. For example, have ansible generate a file
>     with a list of notified but not executed handlers, which you can
>     edit as you want and then pass it to a '--handlers-file' option,
>     in a similar fashion to how limit files work for limiting hosts.
>
>>
>>
>>
>>
>>     On Tue, Mar 18, 2014 at 10:18 AM, Petros Moisiadis
>>     <[email protected] <mailto:[email protected]>> wrote:
>>
>>         On 03/18/14 14:44, Michael DeHaan wrote:
>>>         "For example, if a task that changes a server's
>>>         configuration fails, forcing the execution of a handler that
>>>         reloads/restarts the server could lead to the server failing
>>>         to operate properly or be able to serve at all. "
>>>
>>>         This is why it's a command line option to be used only when
>>>         desired.
>>>
>>         Being it a command line option does not help, because you do
>>         not know before running the command that any of your tasks is
>>         going to fail and how.
>>
>>         I think that the correct analysis of the problem is like this:
>>
>>         You have designed a sequence of deployment tasks that should
>>         be run in a specific order. Your task specification language
>>         (ansible playbook) does not know what is the best thing to do
>>         if task execution is abnormally interrupted. Only you, the
>>         ansible user, will be able to know what should be done.
>>         Normally you expect everything to run fine, but, if something
>>         goes wrong (and things can still go wrong on production
>>         deployments, even after passed tests), you want your action
>>         to stop as early as possible. At the same time you want to
>>         have the tools that will help you to mitigate the problems
>>         caused by the abnormal interruption. And you want full
>>         control over these tools. You do not want the tools to decide
>>         for you on what should be done. You are the one to decide,
>>         and you can't do so before you actually see what and how it
>>         failed.
>>
>>         So, you need a tool to run already notified handlers (or part
>>         of them), but you will use it only if it is good for the
>>         health of your system. You cannot decide about that until you
>>         actually see what happened.
>>
>>>
>>>
>>>
>>>         On Tue, Mar 18, 2014 at 8:22 AM, Petros Moisiadis
>>>         <[email protected] <mailto:[email protected]>> wrote:
>>>
>>>             On 03/17/14 15:51, Michael DeHaan wrote:
>>>>             There was a post about this last week about adding a
>>>>             --force-handlers statement.
>>>>
>>>>             This can be done though we're currently chasing some
>>>>             other items presently.
>>>>
>>>>             Pull requests would be welcome.
>>>>
>>>>
>>>>
>>>>
>>>>             On Sat, Mar 15, 2014 at 1:12 PM, Julio Monteiro
>>>>             <[email protected]
>>>>             <mailto:[email protected]>> wrote:
>>>>
>>>>                 Hello all,
>>>>
>>>>                 I am curious if/how Ansible plans on solving the
>>>>                 "replay notifications" issue. I am having the exact
>>>>                 same issue as reported on this StackOverflow
>>>>                 question (the author does a great job describing
>>>>                 the issue):
>>>>                 
>>>> http://stackoverflow.com/questions/21538516/ansible-how-to-replay-notifications
>>>>
>>>>                 I find that it is easy to prevent that by, instead
>>>>                 of using notifications, using tasks with "when:"
>>>>                 statements. But I really find that this is a
>>>>                 workaround and notifications are great and easy
>>>>                 features -- it should be a default way to replay
>>>>                 them, or at least have a list of
>>>>                 queued-but-not-executed notifications whenever a
>>>>                 task fails.
>>>>
>>>>                 Thanks,
>>>>                   jmonteiro
>>>>                 -- 
>>>>                 You received this message because you are
>>>>                 subscribed to the Google Groups "Ansible Project"
>>>>                 group.
>>>>                 To unsubscribe from this group and stop receiving
>>>>                 emails from it, send an email to
>>>>                 [email protected]
>>>>                 <mailto:[email protected]>.
>>>>                 To post to this group, send email to
>>>>                 [email protected]
>>>>                 <mailto:[email protected]>.
>>>>                 To view this discussion on the web visit
>>>>                 
>>>> https://groups.google.com/d/msgid/ansible-project/64e1dbe8-5b36-4e6b-b521-f086d2952008%40googlegroups.com
>>>>                 
>>>> <https://groups.google.com/d/msgid/ansible-project/64e1dbe8-5b36-4e6b-b521-f086d2952008%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>>>                 For more options, visit
>>>>                 https://groups.google.com/d/optout.
>>>>
>>>>
>>>>             -- 
>>>>             You received this message because you are subscribed to
>>>>             the Google Groups "Ansible Project" group.
>>>>             To unsubscribe from this group and stop receiving
>>>>             emails from it, send an email to
>>>>             [email protected]
>>>>             <mailto:[email protected]>.
>>>>             To post to this group, send email to
>>>>             [email protected]
>>>>             <mailto:[email protected]>.
>>>>             To view this discussion on the web visit
>>>>             
>>>> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNy-TU-EmK_i8%2BZLDMD9DFqC-UfvG4OAAc9J7sY6amSgQ%40mail.gmail.com
>>>>             
>>>> <https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNy-TU-EmK_i8%2BZLDMD9DFqC-UfvG4OAAc9J7sY6amSgQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>>
>>>>
>>>>             For more options, visit https://groups.google.com/d/optout.
>>>
>>>             Usually the task that notifies a handler has made some
>>>             changes that affect the behavior of systems involved by
>>>             the actions being taken in the handler. If the notifying
>>>             task fails, but its handler is forced to run, then the
>>>             behavior of the involved systems could be unpredictable
>>>             or unwanted. For example, if a task that changes a
>>>             server's configuration fails, forcing the execution of a
>>>             handler that reloads/restarts the server could lead to
>>>             the server failing to operate properly or be able to
>>>             serve at all. So, I think that a '--force-handlers'
>>>             option is quite risky and could lead to unpredictable
>>>             behavior. It would be better to let users control the
>>>             (selective) replaying of the handlers only _after_ the
>>>             failure occurs.
>>>             -- 
>>>             You received this message because you are subscribed to
>>>             the Google Groups "Ansible Project" group.
>>>             To unsubscribe from this group and stop receiving emails
>>>             from it, send an email to
>>>             [email protected]
>>>             <mailto:[email protected]>.
>>>             To post to this group, send email to
>>>             [email protected]
>>>             <mailto:[email protected]>.
>>>             To view this discussion on the web visit
>>>             
>>> https://groups.google.com/d/msgid/ansible-project/53283A78.8000207%40yahoo.gr
>>>             
>>> <https://groups.google.com/d/msgid/ansible-project/53283A78.8000207%40yahoo.gr?utm_medium=email&utm_source=footer>.
>>>
>>>
>>>             For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>         -- 
>>>         You received this message because you are subscribed to the
>>>         Google Groups "Ansible Project" group.
>>>         To unsubscribe from this group and stop receiving emails
>>>         from it, send an email to
>>>         [email protected]
>>>         <mailto:[email protected]>.
>>>         To post to this group, send email to
>>>         [email protected]
>>>         <mailto:[email protected]>.
>>>         To view this discussion on the web visit
>>>         
>>> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNYsfvGxhGfMhCzBGVoh7p1i0qpG7bJaHMU%3D9empnhnEw%40mail.gmail.com
>>>         
>>> <https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNYsfvGxhGfMhCzBGVoh7p1i0qpG7bJaHMU%3D9empnhnEw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>
>>>
>>>         For more options, visit https://groups.google.com/d/optout.
>>
>>         -- 
>>         You received this message because you are subscribed to the
>>         Google Groups "Ansible Project" group.
>>         To unsubscribe from this group and stop receiving emails from
>>         it, send an email to
>>         [email protected]
>>         <mailto:[email protected]>.
>>         To post to this group, send email to
>>         [email protected]
>>         <mailto:[email protected]>.
>>         To view this discussion on the web visit
>>         
>> https://groups.google.com/d/msgid/ansible-project/532855CE.7040309%40yahoo.gr
>>         
>> <https://groups.google.com/d/msgid/ansible-project/532855CE.7040309%40yahoo.gr?utm_medium=email&utm_source=footer>.
>>
>>
>>         For more options, visit https://groups.google.com/d/optout.
>>
>>
>>     -- 
>>     You received this message because you are subscribed to the
>>     Google Groups "Ansible Project" group.
>>     To unsubscribe from this group and stop receiving emails from it,
>>     send an email to [email protected]
>>     <mailto:[email protected]>.
>>     To post to this group, send email to
>>     [email protected]
>>     <mailto:[email protected]>.
>>     To view this discussion on the web visit
>>     
>> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QOM09Gi23MT4A-kX8yOBX%2Bfc0%2BNC%3DO54BDjvQoq8gDdtg%40mail.gmail.com
>>     
>> <https://groups.google.com/d/msgid/ansible-project/CAEVJ8QOM09Gi23MT4A-kX8yOBX%2Bfc0%2BNC%3DO54BDjvQoq8gDdtg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>
>>
>>     For more options, visit https://groups.google.com/d/optout.
>
>     -- 
>     You received this message because you are subscribed to the Google
>     Groups "Ansible Project" group.
>     To unsubscribe from this group and stop receiving emails from it,
>     send an email to [email protected]
>     <mailto:[email protected]>.
>     To post to this group, send email to
>     [email protected]
>     <mailto:[email protected]>.
>     To view this discussion on the web visit
>     
> https://groups.google.com/d/msgid/ansible-project/53285AF2.3040604%40yahoo.gr
>     
> <https://groups.google.com/d/msgid/ansible-project/53285AF2.3040604%40yahoo.gr?utm_medium=email&utm_source=footer>.
>
>
>     For more options, visit https://groups.google.com/d/optout.
>
>
> -- 
> You received this message because you are subscribed to the Google
> Groups "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected]
> <mailto:[email protected]>.
> To post to this group, send email to [email protected]
> <mailto:[email protected]>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QMO%3DOpfyJU2SzjddMUmRCASZmqTi8RGPCxLCZy9ras0DA%40mail.gmail.com
> <https://groups.google.com/d/msgid/ansible-project/CAEVJ8QMO%3DOpfyJU2SzjddMUmRCASZmqTi8RGPCxLCZy9ras0DA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.


I like the idea of having a powerful "--retry @retryfile" option with
sensible defaults. The retry file could be as simple as a yaml file like
the following:

|---||
||hosts:||
||  - host_who_failed_1 ||
||  - host_who_failed_2||
||  - host_who_failed_3||
||start_at: "The task that caused abnormal interruption"||
||notify:||
||  - A hander already notified ||before the abnormal interruption||
||  - Another handler already notified before the abnormal interruption
tags: all
|
The "hosts" list would be auto-generated with the hosts that have
failed, but it will be possible to remove/add some hosts, as well as use
a host selection pattern instead of a list.

The "start_at" would be auto-set to the task that has failed, since
usually you don't want to retry from the beginning. But could be removed
to retry from the beginning or changed to another task before or after
the task that failed. The last option (to retry starting after the
failed task) could become useful in case you think that the failure is
not that important and you don't want to spend time fixing it at the
time of the occurrence, but want a quick workaround by bypassing it at
first and fixing it later.

The "notify" directive would force the notification of the handlers in
the list. This list would initially be auto-generated with handlers that
had already been notified before the failure. The ansible user will have
the option to manipulate the list according to what he thinks is best
for recovering from the failure. He could remove some items from the
list or remove the whole list. He could even add any extra handlers he
thinks that are necessary.

The "tags" directive would be auto-set to "all" to retry tasks whatever
their tags may be, but could also be restricted by passing a list with
specific tags.

To make it even more powerful, the retry file could even support
"pre_tasks" and "post_tasks" lists of one-time, ad-hoc tasks that the
ansible user could quickly write to quickly work around unpredicted
problems caused from an unexpected failure, before making a proper fix
in his playbooks.

What do you think?|
|

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/532971A4.4010808%40yahoo.gr.
For more options, visit https://groups.google.com/d/optout.

Re: [ansible-project] On replaying notifications after failures

Reply via email to