On 31 January 2016 at 13:20, Sam Ruby <[email protected]> wrote:
> On Sun, Jan 31, 2016 at 7:56 AM, sebb <[email protected]> wrote:
>> On 31 January 2016 at 12:24, Sam Ruby <[email protected]> wrote:
>>> On Sun, Jan 31, 2016 at 6:45 AM, sebb <[email protected]> wrote:
>>>> On 30 January 2016 at 00:17, Sam Ruby <[email protected]> wrote:
>>>>> Commit 97680383779cd485498d6bb8fa8288af49cc2145:
>>>>>     warnings should trump info
>>>>>     also implement parsing todo
>>>>>
>>>>>
>>>>> Branch: refs/heads/master
>>>>> Author: Sam Ruby <[email protected]>
>>>>> Committer: Sam Ruby <[email protected]>
>>>>> Pusher: rubys <[email protected]>
>>>>>
>>>>> ------------------------------------------------------------
>>>>> www/status/monitors/public_json.rb                           | ++++++ ---
>>>>> ------------------------------------------------------------
>>>>> 16 changes: 11 additions, 5 deletions.
>>>>> ------------------------------------------------------------
>>>>>
>>>>>
>>>>> diff --git a/www/status/monitors/public_json.rb 
>>>>> b/www/status/monitors/public_json.rb
>>>>> index 39eb31b..7ff71b1 100644
>>>>> --- a/www/status/monitors/public_json.rb
>>>>> +++ b/www/status/monitors/public_json.rb
>>>>> @@ -21,16 +21,22 @@ def Monitor.public_json(previous_status)
>>>>>        # Ignore Wunderbar logging for normal messages (may occur multiple 
>>>>> times)
>>>>>        contents.gsub! /^(_INFO|_DEBUG) .*\n+/, ''
>>>>>
>>>>> -      # Wunderbar warning (TODO - extract the text and return it?)
>>>>> -      if contents.gsub! /^_WARN .*?\n+/, ''
>>>>> -        status[name].merge! level: 'warning', title: 'warning'
>>>>> -      end
>>>>> -
>>>>>        # diff -u output:
>>>>>        if contents.sub! /^--- .*?\n(\n|\Z)/m, ''
>>>>>          status[name].merge! level: 'info', title: 'updated'
>>>>>        end
>>>>>
>>>>> +      # Wunderbar warning
>>>>> +      warnings = contents.scan(/^_WARN (.*?)\n+/)
>>>>> +      if warnings.length == 1
>>>>> +        contents.sub! /^_WARN (.*?)\n+/, ''
>>>>> +        status[name].merge! level: 'warning', data: $1
>>>>> +      elsif warnings.length > 0
>>>>> +        contents.gsub! /^_WARN (.*?)\n+/, ''
>>>>> +        status[name].merge! level: 'warning', data: warnings.flatten,
>>>>> +          title: "#{warnings.length} warnings"
>>>>> +      end
>>>>> +
>>>>
>>>> Is it intended that the title is omitted when there is only one _WARN line?
>>>> If not, the else clause would be sufficient.
>>>
>>> Normally titles aren't intended to set by monitors:
>>>
>>> https://github.com/apache/whimsy/tree/master/www/status#titles
>>>
>>> The intent of the above is to surface the individual warning should
>>> there only be one.  Otherwise provide a count of the number of
>>> warnings.
>>>
>>> If you note, the warnings that we are getting via email are fairly
>>> informative (e.g., they tell you what LDAP server is down), instead of
>>> simply saying "1 warnings".
>>
>> OK
>>
>>>> Also I see that the Ping My Box service is set up to treat warning
>>>> level as a failure.
>>>> I assume we cannot change that just for Whimsy monitoring.
>>>
>>> Actually, we can:
>>>
>>> https://github.com/apache/whimsy/tree/master/www/status#alerts
>>> https://github.com/apache/whimsy/blob/master/www/status/index.cgi#L24
>>>
>>>> I don't think we want PMB 'Down' mails just for warnings, so perhaps
>>>> we should convert the _WARN to informational level?
>>>>
>>>>>        unless contents.empty?
>>>>>          status[name].merge! level: 'danger', data: contents.split("\n")
>>>>>        end
>>>
>>> Let's take an individual LDAP server being unavailable as an example.
>>>
>>> The question is: given that we have LDAP failover, is one server being
>>> down only informational or something that you actually want to warn
>>> somebody about?  When a warning is issued, do you want the warning to
>>> only show up in the status page should somebody happen to look for it,
>>> or actually trigger an email?
>>>
>>> I took an initial stab at it (assuming warnings would only be used
>>> when you actually wanted a human to see the message).  But this is
>>> open for discussion.  Thoughts?
>>
>> I used WARN for LDAP failover because it's more than just
>> informational, but not an error.
>>
>> The intention was that the WARN would be seen, but not to trigger a DOWN 
>> alert.
>> So for example if the same LDAP host keeps failing, we could remove it
>> from the list.
>
> To me, you have just described an informational message.

Huh?

An informational message is one such as git_info.
It does not matter if it is never noticed.

However the LDAP failure is potentially important, and needs to be
kept beyond the lifetime of the log file.
At present AFAICT info messages are lost once the new log file is created.

>> Similarly for other WARN messages (not that there are any yet).
>>
>> I had hoped that the system would allow for different types of e-mails
>> rather than just DOWN/UP but if that is not possible then it would be
>> helpful to have the warnings listed somewhere which can be reviewed
>> from time to time.
>
> I believe that a lot more is possible, but haven't dug in to find out.
> Without knowing how we were going to ultimately use it, Daniel Gruno
> enabled the simplest check possible so that we could explore what we
> ultimately wanted.  Once we have a concrete proposal, perhaps more
> checks can be added.
>
> Meanwhile, what we have is a binary option (pass/fail).  And the
> ability to provide a text message on failure.
>
> Planning ahead, I split both pass and fail into two categories;
> (success, info), (warn, danger).  The web page is designed to show the
> highest level, so so if there is an info message mixed in with success
>  messages, that's what will show up.  If there is a danger message
> mixed in with a number of warnings, that's what will show up. With
> this scheme, the difference between informational and warning being
> informational is something you might want to be aware of, but don't
> trigger an alert.
>
> Perhaps we need more levels if you really want something between info and 
> warn?

I don't think we need more levels.

We just need to be able to treat warn messages differently from info and error.

> - Sam Ruby

Reply via email to