Re: [ansible-project] On replaying notifications after failures

Michael DeHaan Wed, 19 Mar 2014 05:27:41 -0700

"list of tasks and handlers" => "list of failed hosts and handlers".




On Wed, Mar 19, 2014 at 8:26 AM, Michael DeHaan <[email protected]> wrote:

> I don't think a start-at is reasonable because different tasks may fail at
> different points on different hosts.
>
> It should only contain (for now) the list of tasks and handlers to force,
> and should rely on the playbook to do the right thing when re-run.
>
>
> On Wed, Mar 19, 2014 at 6:29 AM, Petros Moisiadis <[email protected]>wrote:
>
>>  On 03/18/2014 04:42 PM, Michael DeHaan wrote:
>>
>> I wouldn't want the system to generate two seperate files, but it could
>> generate a new system for retries.
>>
>>  We can think about this.
>>
>>
>>
>>
>> On Tue, Mar 18, 2014 at 10:40 AM, Petros Moisiadis <[email protected]>wrote:
>>
>>>  On 03/18/14 16:20, Michael DeHaan wrote:
>>>
>>> "Being it a command line option does not help, because you do not know
>>> before running the command that any of your tasks is going to fail and how."
>>>
>>>  Incorrect, because you would use this when running the retry command
>>> only.
>>>
>>>   Well, it did not occur to me that you could actually use that option
>>> _after_ the failure. :-[
>>> However, I think that a more controllable tool to run handlers
>>> selectively could be more powerful for recovering from unexpected
>>> deployment failures. For example, have ansible generate a file with a list
>>> of notified but not executed handlers, which you can edit as you want and
>>> then pass it to a '--handlers-file' option, in a similar fashion to how
>>> limit files work for limiting hosts.
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Mar 18, 2014 at 10:18 AM, Petros Moisiadis <[email protected]>wrote:
>>>
>>>>  On 03/18/14 14:44, Michael DeHaan wrote:
>>>>
>>>> "For example, if a task that changes a server's configuration fails,
>>>> forcing the execution of a handler that reloads/restarts the server could
>>>> lead to the server failing to operate properly or be able to serve at all. 
>>>> "
>>>>
>>>>  This is why it's a command line option to be used only when desired.
>>>>
>>>>   Being it a command line option does not help, because you do not
>>>> know before running the command that any of your tasks is going to fail and
>>>> how.
>>>>
>>>> I think that the correct analysis of the problem is like this:
>>>>
>>>> You have designed a sequence of deployment tasks that should be run in
>>>> a specific order. Your task specification language (ansible playbook) does
>>>> not know what is the best thing to do if task execution is abnormally
>>>> interrupted. Only you, the ansible user, will be able to know what should
>>>> be done. Normally you expect everything to run fine, but, if something goes
>>>> wrong (and things can still go wrong on production deployments, even after
>>>> passed tests), you want your action to stop as early as possible. At the
>>>> same time you want to have the tools that will help you to mitigate the
>>>> problems caused by the abnormal interruption. And you want full control
>>>> over these tools. You do not want the tools to decide for you on what
>>>> should be done. You are the one to decide, and you can't do so before you
>>>> actually see what and how it failed.
>>>>
>>>> So, you need a tool to run already notified handlers (or part of them),
>>>> but you will use it only if it is good for the health of your system. You
>>>> cannot decide about that until you actually see what happened.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 18, 2014 at 8:22 AM, Petros Moisiadis <[email protected]>wrote:
>>>>
>>>>>   On 03/17/14 15:51, Michael DeHaan wrote:
>>>>>
>>>>>  There was a post about this last week about adding a
>>>>> --force-handlers statement.
>>>>>
>>>>> This can be done though we're currently chasing some other items
>>>>> presently.
>>>>>
>>>>> Pull requests would be welcome.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Mar 15, 2014 at 1:12 PM, Julio Monteiro <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> I am curious if/how Ansible plans on solving the "replay
>>>>>> notifications" issue. I am having the exact same issue as reported on 
>>>>>> this
>>>>>> StackOverflow question (the author does a great job describing the 
>>>>>> issue):
>>>>>> http://stackoverflow.com/questions/21538516/ansible-how-to-replay-notifications
>>>>>>
>>>>>> I find that it is easy to prevent that by, instead of using
>>>>>> notifications, using tasks with "when:" statements. But I really find 
>>>>>> that
>>>>>> this is a workaround and notifications are great and easy features -- it
>>>>>> should be a default way to replay them, or at least have a list of
>>>>>> queued-but-not-executed notifications whenever a task fails.
>>>>>>
>>>>>> Thanks,
>>>>>>   jmonteiro
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Ansible Project" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected]
>>>>>> .
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/ansible-project/64e1dbe8-5b36-4e6b-b521-f086d2952008%40googlegroups.com<https://groups.google.com/d/msgid/ansible-project/64e1dbe8-5b36-4e6b-b521-f086d2952008%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Ansible Project" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>>  To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNy-TU-EmK_i8%2BZLDMD9DFqC-UfvG4OAAc9J7sY6amSgQ%40mail.gmail.com<https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNy-TU-EmK_i8%2BZLDMD9DFqC-UfvG4OAAc9J7sY6amSgQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>>>
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>> Usually the task that notifies a handler has made some changes that
>>>>> affect the behavior of systems involved by the actions being taken in the
>>>>> handler. If the notifying task fails, but its handler is forced to run,
>>>>> then the behavior of the involved systems could be unpredictable or
>>>>> unwanted. For example, if a task that changes a server's configuration
>>>>> fails, forcing the execution of a handler that reloads/restarts the server
>>>>> could lead to the server failing to operate properly or be able to serve 
>>>>> at
>>>>> all. So, I think that a '--force-handlers' option is quite risky and could
>>>>> lead to unpredictable behavior. It would be better to let users control 
>>>>> the
>>>>> (selective) replaying of the handlers only _after_ the failure occurs.
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Ansible Project" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>>  To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/ansible-project/53283A78.8000207%40yahoo.gr<https://groups.google.com/d/msgid/ansible-project/53283A78.8000207%40yahoo.gr?utm_medium=email&utm_source=footer>.
>>>>>
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Ansible Project" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>>  To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNYsfvGxhGfMhCzBGVoh7p1i0qpG7bJaHMU%3D9empnhnEw%40mail.gmail.com<https://groups.google.com/d/msgid/ansible-project/CAEVJ8QNYsfvGxhGfMhCzBGVoh7p1i0qpG7bJaHMU%3D9empnhnEw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>>
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Ansible Project" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>>  To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/ansible-project/532855CE.7040309%40yahoo.gr<https://groups.google.com/d/msgid/ansible-project/532855CE.7040309%40yahoo.gr?utm_medium=email&utm_source=footer>.
>>>>
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Ansible Project" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>  To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QOM09Gi23MT4A-kX8yOBX%2Bfc0%2BNC%3DO54BDjvQoq8gDdtg%40mail.gmail.com<https://groups.google.com/d/msgid/ansible-project/CAEVJ8QOM09Gi23MT4A-kX8yOBX%2Bfc0%2BNC%3DO54BDjvQoq8gDdtg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Ansible Project" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>  To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/ansible-project/53285AF2.3040604%40yahoo.gr<https://groups.google.com/d/msgid/ansible-project/53285AF2.3040604%40yahoo.gr?utm_medium=email&utm_source=footer>.
>>>
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Ansible Project" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/ansible-project/CAEVJ8QMO%3DOpfyJU2SzjddMUmRCASZmqTi8RGPCxLCZy9ras0DA%40mail.gmail.com<https://groups.google.com/d/msgid/ansible-project/CAEVJ8QMO%3DOpfyJU2SzjddMUmRCASZmqTi8RGPCxLCZy9ras0DA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> I like the idea of having a powerful "--retry @retryfile" option with
>> sensible defaults. The retry file could be as simple as a yaml file like
>> the following:
>>
>> ---
>> hosts:
>>   - host_who_failed_1
>>   - host_who_failed_2
>>   - host_who_failed_3
>> start_at: "The task that caused abnormal interruption"
>> notify:
>>   - A hander already notified before the abnormal interruption
>>   - Another handler already notified before the abnormal interruption
>> tags: all
>>
>> The "hosts" list would be auto-generated with the hosts that have failed,
>> but it will be possible to remove/add some hosts, as well as use a host
>> selection pattern instead of a list.
>>
>> The "start_at" would be auto-set to the task that has failed, since
>> usually you don't want to retry from the beginning. But could be removed to
>> retry from the beginning or changed to another task before or after the
>> task that failed. The last option (to retry starting after the failed task)
>> could become useful in case you think that the failure is not that
>> important and you don't want to spend time fixing it at the time of the
>> occurrence, but want a quick workaround by bypassing it at first and fixing
>> it later.
>>
>> The "notify" directive would force the notification of the handlers in
>> the list. This list would initially be auto-generated with handlers that
>> had already been notified before the failure. The ansible user will have
>> the option to manipulate the list according to what he thinks is best for
>> recovering from the failure. He could remove some items from the list or
>> remove the whole list. He could even add any extra handlers he thinks that
>> are necessary.
>>
>> The "tags" directive would be auto-set to "all" to retry tasks whatever
>> their tags may be, but could also be restricted by passing a list with
>> specific tags.
>>
>> To make it even more powerful, the retry file could even support
>> "pre_tasks" and "post_tasks" lists of one-time, ad-hoc tasks that the
>> ansible user could quickly write to quickly work around unpredicted
>> problems caused from an unexpected failure, before making a proper fix in
>> his playbooks.
>>
>> What do you think?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Ansible Project" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/ansible-project/532971A4.4010808%40yahoo.gr<https://groups.google.com/d/msgid/ansible-project/532971A4.4010808%40yahoo.gr?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/CAEVJ8QPpWZ-7ex2m7Wzc0S0pk%2BczeysTBF75f32Eb8TBJbSdrA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ansible-project] On replaying notifications after failures

Reply via email to