Baltasar Cevc wrote:
> Well, actually the only use case I had in mind when quickly writing 
> down these lines was the mentioned firewall check. It was more or less

> just some code to underline what I mean.
> 
> But to comment your suggestion: I think I don't get what you mean.
> Is it about parsing the monitor output and adjusting the exit code 
> depending on the error?
> 
> If it's that, I would not agree. When should a monitor not exit - I 
> assume we should get a code even on very bad acassions like segfaults 
> or similar. As the compiler generates that code (or the interpreter in

> our case), it should be a sane code (thus not 0, whicht means that it 
> would be false), shouldn't it?
> Checking any specific messages or error codes would foil the goal to 
> have a generic monitor we can just prepend to any existing one.

I wasn't talking about writing specific checks, more a general concern
over the error messages.  If a monitor fails with an error code > 1,
there's still a problem - and it shouldn't get not'd(!'d) to success.

As we all know, and I apologize for even enumeration: traditionally a
program returns an exit code of 0 on success (or at least, no error), 1
on failure, and (other) for various other error conditions.  

To use ping as an example (perhaps not the best, as mon recommends
fping) - if the host responds to the ping, the exit code is 0.  If the
host does not respond, the exit code is 1.  If there was an error of
some sort, you get an error message (for e.g. "ping 10" -> "connect:
Invalid argument", returns 2).

If you now negate that to check that a host is down - you will
erroneously report the host down - the 'normal' status you are expecting
- and you might 'run with' that configuration thinking it's working as
expected.  This is obviously an overly simplistic example, but there are
other cases where legitimate errors could crop up and would then return
'success' from any generic 'not' monitor.  I was merely suggesting that
it would want to pass through any exit code that wasn't either a 0 or a
1, and only negate the 0's or 1's.

No one solution will work for all cases anyway, as some (many?) scripts
use the exit value to pass a useful parameter back - for e.g. the
dns.monitor returns the number of servers that failed.  I just wanted to
point out that it might not always be safe to simply 'not' a script,
specifically because of those segfault/etc. conditions you mentioned.
You also lose the utility of the exit=range alert argument with the
generic NOT monitor, but you'll lose that regardless without custom
tweaking.

So, I guess I was warning that you would need specific checks, and
couldn't get away with a simple generic not script, after all.  Writing
a generic 'not' script that people could then copy and modify to suit
whatever monitor script they want to 'not' - with comments describing
what to watch out for - might be a worthwhile endeavor though.  Maybe
just accept parameters of what codes to transform into what, or what
ranges to return negated.  Okay, now I'm just rambling.

In any case, I don't fault you for your code - any solution, or the
start of one, is better than nothing - I just wanted to make sure
someone wouldn't grab it and run with it thinking it would work in all
cases. (not just now, but someone searching the mailing list in the
future)

cheers,
- Martin Norland, Sys Admin / Database / Web Developer, International
Outreach x3257

The opinion(s) contained within this email do not necessarily represent
those of St. Jude Children's Research Hospital.


_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to