Hi,
First let me say thanks to Jim Trocki (and everyone who has helped) for
mon. After using another package for the last couple of years, I decided
to look around for something that would be easier to maintain/extend.
After configuring mon, I decided I wanted 2 new features:
1) The ability to have a "only_hosts" definition in addition to
definition "exclude_hosts" (Note: I'm not hooked on the specific
name "only_hosts", it's just what I came up with at the time).
The reason I wanted this was due to my wanting to group all of
my webservers in one watch group. My problem was that some
webservers run on a special port. For example:
[...]
define(_NORMAL_WWW_, `sales marketing')dnl
define(_SECURE_WWW_, `sales')dnl
define(_OTHER_WWW_, `tests')dnl
define(_WEBSERVERS_, _NORMAL_WWW_ _SECURE_WWW_ _OTHER_WWW_)dnl
hostgroup webservers _WEBSERVERS_
watch webservers
service ping
monitor fping.monitor
service telnet
monitor telnet.monitor
service freespace
monitor snmpdiskspace.monitor --community fubar
service http
monitor http.monitor
only_hosts _NORMAL_WWW_
service https
monitor tcp.monitor -p 443
only_hosts _SECURE_WWW_
service testhttp
monitor tcp.monitor -p 8001
only_hosts _OTHER_WWW_
[...]
I realize this can be accomplished using the "exclude_hosts"
definition, but it would require one to remember all of the
m4 host definitions that don't belong in that particular service.
So if you add a new m4 group, you would have to find the watch
group and make sure to add it to all of the service definitions
for which it didn't apply.
BTW, this patch also includes a patch to remove duplicate hosts
from the hostgroup (since it's quite easy to have duplicate hosts
using this method) -- no need to duplicate checks. :-)
2) The ability to have "hostname" entries in addition to "hostgroup"
entries. For hosts that are in multiple hostgroups, I'd rather
define them and list all of the groups they're in as opposed to
finding the individual hostgroups and adding them in. This makes
it much easier for me to temporarily remove a host from mon. For
example:
[...]
hostgroup webservers sales tests
hostgroup nfsservers nfs1 nfs2
hostname oddserver webservers nfsservers
[...]
Now if oddserver is going to be down for a while, I can just comment
out the one line from my config file. Again, I realize this can be
accomplished in other ways (specifically using the m4 approach), but
I don't feel this is nearly as clean as what I've implemented.
I'll append the patches (against mon-0.99.2) I've done (which include
patches to the documentation and example files) to this message. Either
one can be installed seperately, or you can install both.
Comments/Suggestions/Flames are appreciated. :-)
...dave alden
--- mon_ORIG Tue Oct 16 09:28:41 2001
+++ mon Tue Oct 16 09:33:09 2001
@@ -1166,6 +1166,7 @@
$sref->{"dep_behavior"} = $DEP_BEHAVIOR;
$sref->{"exclude_period"} = "";
$sref->{"exclude_hosts"} = {};
+ $sref->{"only_hosts"} = {};
$sref->{"_op_status"} = $STAT_UNTESTED;
$sref->{"_last_op_status"} = $STAT_UNTESTED;
$sref->{"_ack"} = 0;
@@ -1458,6 +1459,16 @@
$args = $ex;
}
+ elsif ($var eq "only_hosts")
+ {
+ my $on = {};
+ foreach my $h (split (/\s+/, $args))
+ {
+ $on->{$h} = 1;
+ }
+ $args = $on;
+ }
+
elsif ($var eq "exclude_period" && inPeriod (time, $args) == -1)
{
close (CFG);
@@ -2620,6 +2631,10 @@
join (" ", keys %{$sref->{exclude_hosts}}) . "'"
if (keys %{$sref->{"exclude_hosts"}});
+ $buf .= " only_hosts='" .
+ join (" ", keys %{$sref->{only_hosts}}) . "'"
+ if (keys %{$sref->{"only_hosts"}});
+
$buf .= " randskew=$sref->{randskew}"
if ($sref->{"randskew"});
@@ -2978,7 +2993,8 @@
#
sub run_monitor {
my ($group, $service) = @_;
- my (@args, @groupargs, $pid, @ghosts, $monitor, $monitorargs);
+ my (@args, @groupargs, $pid, @ghosts, $monitor, $monitorargs,
+ @thosts, %seen, $on);
my $sref = \%{$watch{$group}->{$service}};
@@ -3008,23 +3024,44 @@
# exclude disabled hosts
#
} else {
- @ghosts = grep (!/^\*/, @{$groups{$group}});
- #
- # per-service excludes
- #
- if (keys %{$sref->{"exclude_hosts"}})
+ if (keys %{$sref->{"only_hosts"}})
{
my @g = ();
- for (my $i=0; $i<@ghosts; $i++)
+ foreach $on (keys %{$sref->{"only_hosts"}})
{
- push (@g, $ghosts[$i])
- if !$sref->{"exclude_hosts"}->{$ghosts[$i]};
+ push (@g, $on);
}
- @ghosts = @g;
+ @thosts = @g;
+
+ } else {
+
+ @thosts = grep (!/^\*/, @{$groups{$group}});
+
+ #
+ # per-service excludes
+ #
+ if (keys %{$sref->{"exclude_hosts"}})
+ {
+ my @g = ();
+
+ for (nmy $i=0; $i<@thosts; $i++)
+ {
+ push (@g, $thosts[$i])
+ if !$sref->{"exclude_hosts"}->{$thosts[$i]};
+ }
+
+ @thosts = @g;
+ }
}
+
+ #
+ # get rid of duplicate hosts
+ #
+ %seen = ();
+ @ghosts = grep { ! $seen{$_}++ } @thosts;
@args = (quotewords ('\s+', 0, $monitor), @ghosts);
}
--- doc/mon.8_ORIG Tue Oct 16 09:28:41 2001
+++ doc/mon.8 Tue Oct 16 09:41:08 2001
@@ -991,6 +991,12 @@
will be excluded from the service check.
.TP
+.BI only_hosts " host [host...]"
+Only hosts listed after
+.B only_hosts
+will be included in the service check.
+
+.TP
.BI exclude_period " periodspec"
Do not run a scheduled monitor during the time
identified by
--- etc/example.m4_ORIG Tue Oct 16 09:28:41 2001
+++ etc/example.m4 Tue Oct 16 09:40:05 2001
@@ -38,6 +38,14 @@
define(_RAS_EMAIL_, `bob')dnl # bob is the remote access admin
dnl
dnl #
+dnl # Webserver definitions
+dnl #
+dnl
+define(_NORMAL_WEBSERVERS_, `fubar.com sales.com')dnl
+define(_SECURE_WEBSERVERS_, `sales.com')dnl
+define(_WEBSERVERS_, _SECURE_WEBSERVERS_ _NORMAL_WEBSERVERS_)dnl
+dnl
+dnl #
dnl # -------------------------actual config begins here-------------------------
dnl #
#
@@ -87,7 +95,7 @@
hostgroup netapps f330 f540
-hostgroup wwwservers www
+hostgroup wwwservers _WEBSERVERS_
hostgroup printers hp5si hp5c hp750c
@@ -200,9 +208,19 @@
interval 4m
monitor http.monitor
allow_empty_group
+ only_hosts _NORMAL_WEBSERVERS_
period _ANYTIME_
alert qpage.alert _MIS_PAGER_
upalert mail.alert -S "web server is back up" _MIS_EMAIL_
+ alertevery 45m
+ service https
+ interval 4m
+ monitor tcp.monitor -p 443
+ allow_empty_group
+ only_hosts _SECURE_WEBSERVERS_
+ period _ANYTIME_
+ alert qpage.alert _MIS_PAGER_
+ upalert mail.alert -S "secure web server is back up" _MIS_EMAIL_
alertevery 45m
service telnet
monitor telnet.monitor
--- mon_ORIG Tue Oct 16 09:01:53 2001
+++ mon Tue Oct 16 09:27:06 2001
@@ -754,8 +754,8 @@
#
sub read_cf {
my ($CF, $commit) = @_;
- my ($var, $watchgroup, $ingroup, $curgroup, $inwatch,
- $args, $hosts, %disabled, $h, $i,
+ my ($var, $watchgroup, $ingroup, $inhost, $curgroup, $curhost, $inwatch,
+ $args, $hosts, $groups, %disabled, $h, $g, $i,
$inalias, $curalias);
my ($sref, $pref);
my ($service, $period);
@@ -1002,11 +1002,13 @@
if ($l eq "")
{
$ingroup = 0;
+ $inhost = 0;
$inalias = 0;
$inwatch = 0;
$period = 0;
$curgroup = "";
+ $curhost = "";
$curalias = "";
$watchgroup = "";
@@ -1015,6 +1017,46 @@
}
#
+ # hostname record
+ #
+
+ if ($l =~ /^hostname\s+([a-zA-Z0-9_.-]+)\s*(.*)/)
+ {
+ $curhost = $1;
+
+ $inhost = 1;
+ $inalias = 0;
+ $ingroup = 0;
+ $inwatch = 0;
+ $period = 0;
+
+ $groups = $2;
+
+ foreach $g (split(/\s+/, $groups))
+ {
+ if (! grep(/^\*?$curhost$/, @{$groups{$g}}))
+ {
+ push(@{$new_groups{$g}}, $curhost);
+ }
+ }
+
+ next;
+ }
+
+ if ($inhost)
+ {
+ foreach $g (split(/\s+/, $l))
+ {
+ if (! grep(/^\*?$curhost$/, @{$groups{$g}}))
+ {
+ push(@{$new_groups{$g}}, $curhost);
+ }
+ }
+
+ next;
+ }
+
+ #
# hostgroup record
#
if ($l =~ /^hostgroup\s+([a-zA-Z0-9_.-]+)\s*(.*)/)
@@ -1023,6 +1065,7 @@
$ingroup = 1;
$inalias = 0;
+ $inhost = 0;
$inwatch = 0;
$period = 0;
@@ -1073,6 +1116,7 @@
{
$inalias = 1;
$ingroup = 0;
+ $inhost = 0;
$inwatch = 0;
$period = 0;
@@ -1098,6 +1142,7 @@
$inwatch = 1;
$inalias = 0;
$ingroup = 0;
+ $inhost = 0;
$period = 0;
if (!defined ($new_groups{$watchgroup}))
@@ -1115,6 +1160,7 @@
}
$curgroup = "";
+ $curhost = "";
$service = "";
next;
--- doc/mon.8_ORIG Tue Oct 16 09:01:53 2001
+++ doc/mon.8 Tue Oct 16 10:16:09 2001
@@ -507,6 +507,7 @@
.SH CONFIGURATION FILE
The configuration file consists of zero or more hostgroup definitions,
+zero or more hostname definitions,
and one or more watch definitions. Each watch definition may have one
or more service definitions. A line beginning with optional
leading whitespace and a pound ("#") is
@@ -860,6 +861,30 @@
nfsserver httpserver smbserver
hostgroup router_group cisco7000 agsplus
+.fi
+.RE
+
+.SS "Hostname Entries"
+
+Hostname entries begin with the keyword
+.BR hostname ,
+and are followed by a hostname (or IP address) and one or more hostgroups
+, separated by whitespace. The hostgroups must
+be composed of alphanumeric
+characters, a dash ("-"), a period ("."),
+or an underscore ("_"). Non-blank lines following
+the first hostname line are interpreted as more hostgroups.
+The hostname definition ends with a blank line. NOTE:
+.BR hostname
+entries MUST follow the
+.BR hostgroup
+entries, otherwise they will be lost. For example:
+
+.RS
+.nf
+hostname powerfulclient server client
+
+hostname wimpyclient client
.fi
.RE
--- etc/example.m4_ORIG Tue Oct 16 09:01:53 2001
+++ etc/example.m4 Tue Oct 16 09:25:48 2001
@@ -68,8 +68,10 @@
authtype = getpwnam
#
-# NB: hostgroup and watch entries are terminated with a blank line (or
-# end of file). Don't forget the blank lines between them or you lose.
+# NB: hostgroup, hostname and watch entries are terminated with a blank line
+# (or end of file). Don't forget the blank lines between them or you lose.
+# Also note that hostname entries MUST FOLLOW the hostgroup definitions,
+# placing them before will cause them to get lost.
#
#
@@ -83,7 +85,7 @@
hostgroup hubs cisco316t hp800t ssii10
-hostgroup workstations blue yellow red green cornflower violet
+hostgroup workstations blue yellow red cornflower violet
hostgroup netapps f330 f540
@@ -94,6 +96,11 @@
hostgroup new nntp
hostgroup ftp ftp
+
+#
+# hostname definitions (hostnames or IP addresses)
+#
+hostname green workstations wwwservers
#
# For the servers in building 1, monitor ping and telnet
--- etc/example.cf_ORIG Tue Oct 16 09:01:53 2001
+++ etc/example.cf Tue Oct 16 09:26:15 2001
@@ -27,8 +27,10 @@
authtype = getpwnam
#
-# NB: hostgroup and watch entries are terminated with a blank line (or
-# end of file). Don't forget the blank lines between them or you lose.
+# NB: hostgroup, hostname and watch entries are terminated with a blank line
+# (or end of file). Don't forget the blank lines between them or you lose.
+# Also note that hostname entries MUST FOLLOW the hostgroup definitions,
+# placing them before will cause them to get lost.
#
#
@@ -42,7 +44,7 @@
hostgroup hubs cisco316t hp800t ssii10
-hostgroup workstations blue yellow red green cornflower violet
+hostgroup workstations blue yellow red cornflower violet
hostgroup netapps f330 f540
@@ -53,6 +55,11 @@
hostgroup new nntp
hostgroup ftp ftp
+
+#
+# hostname definitions (hostnames or IP addresses)
+#
+hostname green workstations wwwservers
#
# For the servers in building 1, monitor ping and telnet