On 31 January 2016 at 14:09, Sam Ruby <[email protected]> wrote: > On Sun, Jan 31, 2016 at 8:29 AM, sebb <[email protected]> wrote: >> On 31 January 2016 at 13:20, Sam Ruby <[email protected]> wrote: >>> On Sun, Jan 31, 2016 at 7:56 AM, sebb <[email protected]> wrote: >>>> On 31 January 2016 at 12:24, Sam Ruby <[email protected]> wrote: >>>>> On Sun, Jan 31, 2016 at 6:45 AM, sebb <[email protected]> wrote: >>>>>> On 30 January 2016 at 00:17, Sam Ruby <[email protected]> wrote: >>>>>>> Commit 97680383779cd485498d6bb8fa8288af49cc2145: >>>>>>> warnings should trump info >>>>>>> also implement parsing todo >>>>>>> >>>>>>> >>>>>>> Branch: refs/heads/master >>>>>>> Author: Sam Ruby <[email protected]> >>>>>>> Committer: Sam Ruby <[email protected]> >>>>>>> Pusher: rubys <[email protected]> >>>>>>> >>>>>>> ------------------------------------------------------------ >>>>>>> www/status/monitors/public_json.rb | ++++++ >>>>>>> --- >>>>>>> ------------------------------------------------------------ >>>>>>> 16 changes: 11 additions, 5 deletions. >>>>>>> ------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> diff --git a/www/status/monitors/public_json.rb >>>>>>> b/www/status/monitors/public_json.rb >>>>>>> index 39eb31b..7ff71b1 100644 >>>>>>> --- a/www/status/monitors/public_json.rb >>>>>>> +++ b/www/status/monitors/public_json.rb >>>>>>> @@ -21,16 +21,22 @@ def Monitor.public_json(previous_status) >>>>>>> # Ignore Wunderbar logging for normal messages (may occur >>>>>>> multiple times) >>>>>>> contents.gsub! /^(_INFO|_DEBUG) .*\n+/, '' >>>>>>> >>>>>>> - # Wunderbar warning (TODO - extract the text and return it?) >>>>>>> - if contents.gsub! /^_WARN .*?\n+/, '' >>>>>>> - status[name].merge! level: 'warning', title: 'warning' >>>>>>> - end >>>>>>> - >>>>>>> # diff -u output: >>>>>>> if contents.sub! /^--- .*?\n(\n|\Z)/m, '' >>>>>>> status[name].merge! level: 'info', title: 'updated' >>>>>>> end >>>>>>> >>>>>>> + # Wunderbar warning >>>>>>> + warnings = contents.scan(/^_WARN (.*?)\n+/) >>>>>>> + if warnings.length == 1 >>>>>>> + contents.sub! /^_WARN (.*?)\n+/, '' >>>>>>> + status[name].merge! level: 'warning', data: $1 >>>>>>> + elsif warnings.length > 0 >>>>>>> + contents.gsub! /^_WARN (.*?)\n+/, '' >>>>>>> + status[name].merge! level: 'warning', data: warnings.flatten, >>>>>>> + title: "#{warnings.length} warnings" >>>>>>> + end >>>>>>> + >>>>>> >>>>>> Is it intended that the title is omitted when there is only one _WARN >>>>>> line? >>>>>> If not, the else clause would be sufficient. >>>>> >>>>> Normally titles aren't intended to set by monitors: >>>>> >>>>> https://github.com/apache/whimsy/tree/master/www/status#titles >>>>> >>>>> The intent of the above is to surface the individual warning should >>>>> there only be one. Otherwise provide a count of the number of >>>>> warnings. >>>>> >>>>> If you note, the warnings that we are getting via email are fairly >>>>> informative (e.g., they tell you what LDAP server is down), instead of >>>>> simply saying "1 warnings". >>>> >>>> OK >>>> >>>>>> Also I see that the Ping My Box service is set up to treat warning >>>>>> level as a failure. >>>>>> I assume we cannot change that just for Whimsy monitoring. >>>>> >>>>> Actually, we can: >>>>> >>>>> https://github.com/apache/whimsy/tree/master/www/status#alerts >>>>> https://github.com/apache/whimsy/blob/master/www/status/index.cgi#L24 >>>>> >>>>>> I don't think we want PMB 'Down' mails just for warnings, so perhaps >>>>>> we should convert the _WARN to informational level? >>>>>> >>>>>>> unless contents.empty? >>>>>>> status[name].merge! level: 'danger', data: contents.split("\n") >>>>>>> end >>>>> >>>>> Let's take an individual LDAP server being unavailable as an example. >>>>> >>>>> The question is: given that we have LDAP failover, is one server being >>>>> down only informational or something that you actually want to warn >>>>> somebody about? When a warning is issued, do you want the warning to >>>>> only show up in the status page should somebody happen to look for it, >>>>> or actually trigger an email? >>>>> >>>>> I took an initial stab at it (assuming warnings would only be used >>>>> when you actually wanted a human to see the message). But this is >>>>> open for discussion. Thoughts? >>>> >>>> I used WARN for LDAP failover because it's more than just >>>> informational, but not an error. >>>> >>>> The intention was that the WARN would be seen, but not to trigger a DOWN >>>> alert. >>>> So for example if the same LDAP host keeps failing, we could remove it >>>> from the list. >>> >>> To me, you have just described an informational message. >> >> Huh? >> >> An informational message is one such as git_info. >> It does not matter if it is never noticed. > > I would consider that a success message. >
Perhaps. >> However the LDAP failure is potentially important, and needs to be >> kept beyond the lifetime of the log file. >> At present AFAICT info messages are lost once the new log file is created. >> >>>> Similarly for other WARN messages (not that there are any yet). >>>> >>>> I had hoped that the system would allow for different types of e-mails >>>> rather than just DOWN/UP but if that is not possible then it would be >>>> helpful to have the warnings listed somewhere which can be reviewed >>>> from time to time. >>> >>> I believe that a lot more is possible, but haven't dug in to find out. >>> Without knowing how we were going to ultimately use it, Daniel Gruno >>> enabled the simplest check possible so that we could explore what we >>> ultimately wanted. Once we have a concrete proposal, perhaps more >>> checks can be added. >>> >>> Meanwhile, what we have is a binary option (pass/fail). And the >>> ability to provide a text message on failure. >>> >>> Planning ahead, I split both pass and fail into two categories; >>> (success, info), (warn, danger). The web page is designed to show the >>> highest level, so so if there is an info message mixed in with success >>> messages, that's what will show up. If there is a danger message >>> mixed in with a number of warnings, that's what will show up. With >>> this scheme, the difference between informational and warning being >>> informational is something you might want to be aware of, but don't >>> trigger an alert. >>> >>> Perhaps we need more levels if you really want something between info and >>> warn? >> >> I don't think we need more levels. >> >> We just need to be able to treat warn messages differently from info and >> error. > > We seem to have a semantic problem here. Let's momentarily put aside > all labels as they seem to be distracting from the discussion. > > Looking at the "generate an alert and email" boolean: true or false. I think this may be where the issue is: I think it would be useful to generate two kinds of notifications: Alerts that need to be dealt with promptly. Alerts that need to be dealt with at some point. > The first question is: how many levels do we need for things that do > not generate an alert? I will assert that this needs to be at least > two. I guess so. But the question is how are these recorded? Are the ephemeral, until the next run, or do they last longer than that? > The next question is: how many levels do we need for things that do > generate an alert? Again, I will argue that this needs to be at least > two. > > Levels are useful in the case where there are multiple messages, with > some being more important than others. > > Once we figure out how many levels we need, we can figure out what the > names for the various levels should be. It's not a naming issue, it's what is done with the different types of notifications. I don't think the current alternatives are sufficient to properly handle the different types. > - Sam Ruby
