Re: [ansible-project] On replaying notifications after failures

Michael DeHaan Wed, 19 Mar 2014 05:27:08 -0700

I don't think a start-at is reasonable because different tasks may fail at
different points on different hosts.


It should only contain (for now) the list of tasks and handlers to force,
and should rely on the playbook to do the right thing when re-run.


On Wed, Mar 19, 2014 at 6:29 AM, Petros Moisiadis <[email protected]> wrote:

>  On 03/18/2014 04:42 PM, Michael DeHaan wrote:
>
> I wouldn't want the system to generate two seperate files, but it could
> generate a new system for retries.
>
>  We can think about this.
>
>
>
>
> On Tue, Mar 18, 2014 at 10:40 AM, Petros Moisiadis <[email protected]>wrote:
>
>>  On 03/18/14 16:20, Michael DeHaan wrote:
>>
>> "Being it a command line option does not help, because you do not know
>> before running the command that any of your tasks is going to fail and how."
>>
>>  Incorrect, because you would use this when running the retry command
>> only.
>>
>>   Well, it did not occur to me that you could actually use that option
>> _after_ the failure. :-[
>> However, I think that a more controllable tool to run handlers
>> selectively could be more powerful for recovering from unexpected
>> deployment failures. For example, have ansible generate a file with a list
>> of notified but not executed handlers, which you can edit as you want and
>> then pass it to a '--handlers-file' option, in a similar fashion to how
>> limit files work for limiting hosts.
>>
>>
>>
>>
>>
>> On Tue, Mar 18, 2014 at 10:18 AM, Petros Moisiadis <[email protected]>wrote:
>>
>>>  On 03/18/14 14:44, Michael DeHaan wrote:
>>>
>>> "For example, if a task that changes a server's configuration fails,
>>> forcing the execution of a handler that reloads/restarts the server could
>>> lead to the server failing to operate properly or be able to serve at all. "
>>>
>>>  This is why it's a command line option to be used only when desired.
>>>
>>>   Being it a command line option does not help, because you do not know
>>> before running the command that any of your tasks is going to fail and how.
>>>
>>> I think that the correct analysis of the problem is like this:
>>>
>>> You have designed a sequence of deployment tasks that should be run in a
>>> specific order. Your task specification language (ansible playbook) does
>>> not know what is the best thing to do if task execution is abnormally
>>> interrupted. Only you, the ansible user, will be able to know what should
>>> be done. Normally you expect everything to run fine, but, if something goes
>>> wrong (and things can still go wrong on production deployments, even after
>>> passed tests), you want your action to stop as early as possible. At the
>>> same time you want to have the tools that will help you to mitigate the
>>> problems caused by the abnormal interruption. And you want full control
>>> over these tools. You do not want the tools to decide for you on what
>>> should be done. You are the one to decide, and you can't do so before you
>>> actually see what and how it failed.
>>>
>>> So, you need a tool to run already notified handlers (or part of them),
>>> but you will use it only if it is good for the health of your system. You
>>> cannot decide about that until you actually see what happened.
>>>
>>>
>>>
>>>
>>> On Tue, Mar 18, 2014 at 8:22 AM, Petros Moisiadis <[email protected]>wrote:
>>>
>>>>   On 03/17/14 15:51, Michael DeHaan wrote:
>>>>
>>>>  There was a post about this last week about adding a --force-handlers
>>>> statement.
>>>>
>>>> This can be done though we're currently chasing some other items
>>>> presently.
>>>>
>>>> Pull requests would be welcome.
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Mar 15, 2014 at 1:12 PM, Julio Monteiro <
>>>> [email protected]> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I am curious if/how Ansible plans on solving the "replay
>>>>> notifications" issue. I am having the exact same issue as reported on this
>>>>> StackOverflow question (the author does a great job describing the issue):
>>>>> http://stackoverflow.com/questions/21538516/ansible-how-to-replay-notifications
>>>>>
>>>>> I find that it is easy to prevent that by, instead of using
>>>>> notifications, using tasks with "when:" statements. But I really find that
>>>>> this is a workaround and notifications are great and easy features -- it
>>>>> should be a default way to replay them, or at least have a list of
>>>>> queued-but-not-executed notifications whenever a task fails.
>>>>>
>>>>> Thanks,
>>>>>   jmonteiro
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Ansible Project" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/ansible-project/64e1dbe8-5b36-4e6b-b521-f086d2952008%40googlegroups.com<https://groups.google.com/d/msgid/ansible-project/64e1dbe8-5b36-4e6b-b521-f086d2952008%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Ansible Project" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>>  To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNy-TU-EmK_i8%2BZLDMD9DFqC-UfvG4OAAc9J7sY6amSgQ%40mail.gmail.com<https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNy-TU-EmK_i8%2BZLDMD9DFqC-UfvG4OAAc9J7sY6amSgQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>>
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>> Usually the task that notifies a handler has made some changes that
>>>> affect the behavior of systems involved by the actions being taken in the
>>>> handler. If the notifying task fails, but its handler is forced to run,
>>>> then the behavior of the involved systems could be unpredictable or
>>>> unwanted. For example, if a task that changes a server's configuration
>>>> fails, forcing the execution of a handler that reloads/restarts the server
>>>> could lead to the server failing to operate properly or be able to serve at
>>>> all. So, I think that a '--force-handlers' option is quite risky and could
>>>> lead to unpredictable behavior. It would be better to let users control the
>>>> (selective) replaying of the handlers only _after_ the failure occurs.
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Ansible Project" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>>  To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/ansible-project/53283A78.8000207%40yahoo.gr<https://groups.google.com/d/msgid/ansible-project/53283A78.8000207%40yahoo.gr?utm_medium=email&utm_source=footer>.
>>>>
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Ansible Project" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>  To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNYsfvGxhGfMhCzBGVoh7p1i0qpG7bJaHMU%3D9empnhnEw%40mail.gmail.com<https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNYsfvGxhGfMhCzBGVoh7p1i0qpG7bJaHMU%3D9empnhnEw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Ansible Project" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>  To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/ansible-project/532855CE.7040309%40yahoo.gr<https://groups.google.com/d/msgid/ansible-project/532855CE.7040309%40yahoo.gr?utm_medium=email&utm_source=footer>.
>>>
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Ansible Project" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>>  To view this discussion on the web visit
>> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QOM09Gi23MT4A-kX8yOBX%2Bfc0%2BNC%3DO54BDjvQoq8gDdtg%40mail.gmail.com<https://groups.google.com/d/msgid/ansible-project/CAEVJ8QOM09Gi23MT4A-kX8yOBX%2Bfc0%2BNC%3DO54BDjvQoq8gDdtg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Ansible Project" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>>  To view this discussion on the web visit
>> https://groups.google.com/d/msgid/ansible-project/53285AF2.3040604%40yahoo.gr<https://groups.google.com/d/msgid/ansible-project/53285AF2.3040604%40yahoo.gr?utm_medium=email&utm_source=footer>.
>>
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QMO%3DOpfyJU2SzjddMUmRCASZmqTi8RGPCxLCZy9ras0DA%40mail.gmail.com<https://groups.google.com/d/msgid/ansible-project/CAEVJ8QMO%3DOpfyJU2SzjddMUmRCASZmqTi8RGPCxLCZy9ras0DA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
>
> I like the idea of having a powerful "--retry @retryfile" option with
> sensible defaults. The retry file could be as simple as a yaml file like
> the following:
>
> ---
> hosts:
>   - host_who_failed_1
>   - host_who_failed_2
>   - host_who_failed_3
> start_at: "The task that caused abnormal interruption"
> notify:
>   - A hander already notified before the abnormal interruption
>   - Another handler already notified before the abnormal interruption
> tags: all
>
> The "hosts" list would be auto-generated with the hosts that have failed,
> but it will be possible to remove/add some hosts, as well as use a host
> selection pattern instead of a list.
>
> The "start_at" would be auto-set to the task that has failed, since
> usually you don't want to retry from the beginning. But could be removed to
> retry from the beginning or changed to another task before or after the
> task that failed. The last option (to retry starting after the failed task)
> could become useful in case you think that the failure is not that
> important and you don't want to spend time fixing it at the time of the
> occurrence, but want a quick workaround by bypassing it at first and fixing
> it later.
>
> The "notify" directive would force the notification of the handlers in the
> list. This list would initially be auto-generated with handlers that had
> already been notified before the failure. The ansible user will have the
> option to manipulate the list according to what he thinks is best for
> recovering from the failure. He could remove some items from the list or
> remove the whole list. He could even add any extra handlers he thinks that
> are necessary.
>
> The "tags" directive would be auto-set to "all" to retry tasks whatever
> their tags may be, but could also be restricted by passing a list with
> specific tags.
>
> To make it even more powerful, the retry file could even support
> "pre_tasks" and "post_tasks" lists of one-time, ad-hoc tasks that the
> ansible user could quickly write to quickly work around unpredicted
> problems caused from an unexpected failure, before making a proper fix in
> his playbooks.
>
> What do you think?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ansible-project/532971A4.4010808%40yahoo.gr<https://groups.google.com/d/msgid/ansible-project/532971A4.4010808%40yahoo.gr?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/CAEVJ8QN45ctZZ%2BAq4jzGoXd84L3A1qrTKkXEqMtvLvRMe8sHhA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ansible-project] On replaying notifications after failures

Reply via email to