On 31 January 2016 at 14:09, Sam Ruby <[email protected]> wrote:
> On Sun, Jan 31, 2016 at 8:29 AM, sebb <[email protected]> wrote:
>> On 31 January 2016 at 13:20, Sam Ruby <[email protected]> wrote:
>>> On Sun, Jan 31, 2016 at 7:56 AM, sebb <[email protected]> wrote:
>>>> On 31 January 2016 at 12:24, Sam Ruby <[email protected]> wrote:
>>>>> On Sun, Jan 31, 2016 at 6:45 AM, sebb <[email protected]> wrote:
>>>>>> On 30 January 2016 at 00:17, Sam Ruby <[email protected]> wrote:
>>>>>>> Commit 97680383779cd485498d6bb8fa8288af49cc2145:
>>>>>>>     warnings should trump info
>>>>>>>     also implement parsing todo
>>>>>>>
>>>>>>>
>>>>>>> Branch: refs/heads/master
>>>>>>> Author: Sam Ruby <[email protected]>
>>>>>>> Committer: Sam Ruby <[email protected]>
>>>>>>> Pusher: rubys <[email protected]>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> www/status/monitors/public_json.rb                           | ++++++ 
>>>>>>> ---
>>>>>>> ------------------------------------------------------------
>>>>>>> 16 changes: 11 additions, 5 deletions.
>>>>>>> ------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> diff --git a/www/status/monitors/public_json.rb 
>>>>>>> b/www/status/monitors/public_json.rb
>>>>>>> index 39eb31b..7ff71b1 100644
>>>>>>> --- a/www/status/monitors/public_json.rb
>>>>>>> +++ b/www/status/monitors/public_json.rb
>>>>>>> @@ -21,16 +21,22 @@ def Monitor.public_json(previous_status)
>>>>>>>        # Ignore Wunderbar logging for normal messages (may occur 
>>>>>>> multiple times)
>>>>>>>        contents.gsub! /^(_INFO|_DEBUG) .*\n+/, ''
>>>>>>>
>>>>>>> -      # Wunderbar warning (TODO - extract the text and return it?)
>>>>>>> -      if contents.gsub! /^_WARN .*?\n+/, ''
>>>>>>> -        status[name].merge! level: 'warning', title: 'warning'
>>>>>>> -      end
>>>>>>> -
>>>>>>>        # diff -u output:
>>>>>>>        if contents.sub! /^--- .*?\n(\n|\Z)/m, ''
>>>>>>>          status[name].merge! level: 'info', title: 'updated'
>>>>>>>        end
>>>>>>>
>>>>>>> +      # Wunderbar warning
>>>>>>> +      warnings = contents.scan(/^_WARN (.*?)\n+/)
>>>>>>> +      if warnings.length == 1
>>>>>>> +        contents.sub! /^_WARN (.*?)\n+/, ''
>>>>>>> +        status[name].merge! level: 'warning', data: $1
>>>>>>> +      elsif warnings.length > 0
>>>>>>> +        contents.gsub! /^_WARN (.*?)\n+/, ''
>>>>>>> +        status[name].merge! level: 'warning', data: warnings.flatten,
>>>>>>> +          title: "#{warnings.length} warnings"
>>>>>>> +      end
>>>>>>> +
>>>>>>
>>>>>> Is it intended that the title is omitted when there is only one _WARN 
>>>>>> line?
>>>>>> If not, the else clause would be sufficient.
>>>>>
>>>>> Normally titles aren't intended to set by monitors:
>>>>>
>>>>> https://github.com/apache/whimsy/tree/master/www/status#titles
>>>>>
>>>>> The intent of the above is to surface the individual warning should
>>>>> there only be one.  Otherwise provide a count of the number of
>>>>> warnings.
>>>>>
>>>>> If you note, the warnings that we are getting via email are fairly
>>>>> informative (e.g., they tell you what LDAP server is down), instead of
>>>>> simply saying "1 warnings".
>>>>
>>>> OK
>>>>
>>>>>> Also I see that the Ping My Box service is set up to treat warning
>>>>>> level as a failure.
>>>>>> I assume we cannot change that just for Whimsy monitoring.
>>>>>
>>>>> Actually, we can:
>>>>>
>>>>> https://github.com/apache/whimsy/tree/master/www/status#alerts
>>>>> https://github.com/apache/whimsy/blob/master/www/status/index.cgi#L24
>>>>>
>>>>>> I don't think we want PMB 'Down' mails just for warnings, so perhaps
>>>>>> we should convert the _WARN to informational level?
>>>>>>
>>>>>>>        unless contents.empty?
>>>>>>>          status[name].merge! level: 'danger', data: contents.split("\n")
>>>>>>>        end
>>>>>
>>>>> Let's take an individual LDAP server being unavailable as an example.
>>>>>
>>>>> The question is: given that we have LDAP failover, is one server being
>>>>> down only informational or something that you actually want to warn
>>>>> somebody about?  When a warning is issued, do you want the warning to
>>>>> only show up in the status page should somebody happen to look for it,
>>>>> or actually trigger an email?
>>>>>
>>>>> I took an initial stab at it (assuming warnings would only be used
>>>>> when you actually wanted a human to see the message).  But this is
>>>>> open for discussion.  Thoughts?
>>>>
>>>> I used WARN for LDAP failover because it's more than just
>>>> informational, but not an error.
>>>>
>>>> The intention was that the WARN would be seen, but not to trigger a DOWN 
>>>> alert.
>>>> So for example if the same LDAP host keeps failing, we could remove it
>>>> from the list.
>>>
>>> To me, you have just described an informational message.
>>
>> Huh?
>>
>> An informational message is one such as git_info.
>> It does not matter if it is never noticed.
>
> I would consider that a success message.
>

Perhaps.

>> However the LDAP failure is potentially important, and needs to be
>> kept beyond the lifetime of the log file.
>> At present AFAICT info messages are lost once the new log file is created.
>>
>>>> Similarly for other WARN messages (not that there are any yet).
>>>>
>>>> I had hoped that the system would allow for different types of e-mails
>>>> rather than just DOWN/UP but if that is not possible then it would be
>>>> helpful to have the warnings listed somewhere which can be reviewed
>>>> from time to time.
>>>
>>> I believe that a lot more is possible, but haven't dug in to find out.
>>> Without knowing how we were going to ultimately use it, Daniel Gruno
>>> enabled the simplest check possible so that we could explore what we
>>> ultimately wanted.  Once we have a concrete proposal, perhaps more
>>> checks can be added.
>>>
>>> Meanwhile, what we have is a binary option (pass/fail).  And the
>>> ability to provide a text message on failure.
>>>
>>> Planning ahead, I split both pass and fail into two categories;
>>> (success, info), (warn, danger).  The web page is designed to show the
>>> highest level, so so if there is an info message mixed in with success
>>>  messages, that's what will show up.  If there is a danger message
>>> mixed in with a number of warnings, that's what will show up. With
>>> this scheme, the difference between informational and warning being
>>> informational is something you might want to be aware of, but don't
>>> trigger an alert.
>>>
>>> Perhaps we need more levels if you really want something between info and 
>>> warn?
>>
>> I don't think we need more levels.
>>
>> We just need to be able to treat warn messages differently from info and 
>> error.
>
> We seem to have a semantic problem here.  Let's momentarily put aside
> all labels as they seem to be distracting from the discussion.
>
> Looking at the "generate an alert and email" boolean: true or false.

I think this may be where the issue is: I think it would be useful to
generate two kinds of notifications:
Alerts that need to be dealt with promptly.
Alerts that need to be dealt with at some point.

> The first question is: how many levels do we need for things that do
> not generate an alert?  I will assert that this needs to be at least
> two.

I guess so.

But the question is how are these recorded?
Are the ephemeral, until the next run, or do they last longer than that?

> The next question is: how many levels do we need for things that do
> generate an alert?  Again, I will argue that this needs to be at least
> two.
>
> Levels are useful in the case where there are multiple messages, with
> some being more important than others.
>
> Once we figure out how many levels we need, we can figure out what the
> names for the various levels should be.

It's not a naming issue, it's what is done with the different types of
notifications.
I don't think the current alternatives are sufficient to properly
handle the different types.

> - Sam Ruby

Reply via email to