RE: Problem getting traps to work correctly

2006-07-14 Thread Tim Carr
That did the trick.  Thanks for your help.

Thanks,
Tim


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David
Nolan
Sent: Thursday, July 13, 2006 5:03 PM
To: Tim Carr
Cc: mon@linux.kernel.org
Subject: Re: Problem getting traps to work correctly

On 7/13/06, Tim Carr [EMAIL PROTECTED] wrote:
 Here's a bit more information on it.  I've got the slave server
 configured for multiple services, each of them using the
redistribute
 option:

redistribute alert trap.alert mainmonitor


If thats an exact quote you've got the option wrong.  Its just
redistribute trap.alert mainmonitor.

 On the master server, once I've reset it, none of those servers will
 ever go green/good in mon.cgi - they stay in blue/unchecked status.


That sounds like you've still got the period based trap configuration
in place.  (Which would match with the above typo.)

If thats not true, and the line above was a typo in the email not the
configuration, then maybe the redistribute code in CVS is broken.
Before I go investigate that possibility please confirm whether the
line above was an exact quote from your config file.

 In the slave server, the history file shows this for an outage
event:

 alert Store13-2 DRBD_Status 1152819579 /opt/mon/alert.d/trap.alert
 (mainmonitor) DRBD_Not_Running
 upalert Store13-2 DRBD_Status 1152819594 /opt/mon/alert.d/trap.alert
 (mainmonitor) DRBD_Not_Running


This also indicates to me that your old alert/upalert configuration is
still in place, because redistribute does not generate history
entries, because doing so would bloat the history file on the slave
server.

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


RE: Problem getting traps to work correctly

2006-07-13 Thread David Nolan



--On Thursday, July 13, 2006 14:01:58 -0500 Tim Carr [EMAIL PROTECTED] 
wrote:



A question on the redistribute option, though - I'm not sure I can
follow how the configuration works.  For example, my current remote
server config is:


redistribute is a service level config option, not a period option.  For 
example:

watch Store13-2
   service DRBD_Status
   interval 15s
   monitor DRBDCheck.monitor -s you
   description Is\ DRBD\ working\ there?
   redistribute trap.alert mainmonitor


-David



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


RE: Problem getting traps to work correctly

2006-07-13 Thread Tim Carr
Gotcha.  I threw that in, and it seems to work correctly, except I can't
tell if it is or not.  I'm watching the log file, and it shows alerts
being sent on an up/down event, but I'm not seeing alerts every 15s
showing up when things are working correctly.  Is that expected
behavior?

Thanks,
Tim

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of David Nolan
Sent: Thursday, July 13, 2006 7:06 AM
To: mon@linux.kernel.org
Subject: Re: Problem getting traps to work correctly



--On Wednesday, July 12, 2006 16:30:08 -0500 Tim Carr
[EMAIL PROTECTED] 
wrote:

 When mainmonitor gets one of the traps, I'll see this in
 /var/log/messages:



 Jul 11 16:20:04 monitor mon[2017]: trap received for undefined service
 type default/DRBD_Status



 ...but nothing will actually get kicked off and no mail is sent.
Also,
 the mon.cgi program (running on mainmonitor) will stay in the
 blue/unchecked status.


Looks like you've found a logic bug.  The code to set the group 
service 
in handle_trap to default/default has an error which causes it to set
the 
group but never the service.  I just commited a fix for this to CVS.


 Jul 11 16:24:49 monitor mon[2017]: trap trap 0 from  grp=default
 svc=DRBD_Status, sta=0


In this case you're not getting an alert because the status bit of the
trap 
is set to 0, which is the OK status.  It looks like the remote.alert in
CVS 
was never updated when Mon starting using that field.  trap.alert was 
rewritten...  Since these two alerts server the same purpose I'm going
to 
remove remote.alert from CVS.



 Any thoughts as to what's going on here?  I'm trying to get this
 working:

 -An alert getting kicked off by the mainmonitor's system when
it
 receives a trap; and

 -The mon.cgi program on mainmonitor showing an alert status
once
 its received that trap.



BTW, you might want to use the 'redistribute' config parameter for your 
traps, that will cause all status updates to propagate to your main mon 
server.  That way you can see when the last test occurred at all times. 
From the current Mon manpage (in CVS):
   redistribute alert [arg...]
A service may have one redistribute option, which is a special form of
an 
an alert definition.  This alert  will be called on every service status

update, even sequential success status updates.  This can be used to 
integrate Mon with another monitoring system, or to link together
multiple 
Mon servers via an alert script that generates Mon traps.


-David


___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


RE: Problem getting traps to work correctly

2006-07-13 Thread David Nolan



--On Thursday, July 13, 2006 14:20:38 -0500 Tim Carr [EMAIL PROTECTED] 
wrote:



Gotcha.  I threw that in, and it seems to work correctly, except I can't
tell if it is or not.  I'm watching the log file, and it shows alerts
being sent on an up/down event, but I'm not seeing alerts every 15s
showing up when things are working correctly.  Is that expected
behavior?

Thanks,
Tim



I refer to the server that sends the traps as a slave server, and the 
server collecting the traps as the master server.  Your master server 
should receive a trap on every status update on the slave server, i.e. a 
trap every 15s in your example.  The master should only alert based on its 
alert behavior.  This makes receving updates via traps almost functionally 
equivelant to other monitor tests that you run on your master server.


If thats not the behavior you're seeing please let me know.

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


RE: Problem getting traps to work correctly

2006-07-13 Thread Tim Carr
Here's a bit more information on it.  I've got the slave server
configured for multiple services, each of them using the redistribute
option:

   redistribute alert trap.alert mainmonitor

On the master server, once I've reset it, none of those servers will
ever go green/good in mon.cgi - they stay in blue/unchecked status.

If I force a failure on one of those services, the master server will
show that service going red.  Once I re-enable the service, it will then
show green.  But the other services for that slave server will never
change from unchecked state on the master server's mon.cgi.  Also, if
I then re-set the mon process on the master server, all items will go
back to blue and will not change again unless I force a failure.

Also, I'm logging all output to a file on both the master and slave
server via these commands:

logdir = /var/log/mon
historicfile = /var/log/mon/history

In the slave server, the history file shows this for an outage event:

alert Store13-2 DRBD_Status 1152819579 /opt/mon/alert.d/trap.alert
(mainmonitor) DRBD_Not_Running
upalert Store13-2 DRBD_Status 1152819594 /opt/mon/alert.d/trap.alert
(mainmonitor) DRBD_Not_Running

On the master server, it will only log this for that same event:

trapalert Store13-2 DRBD_Status 1152819578 /opt/mon/alert.d/mail.alert
([EMAIL PROTECTED]) DRBD_Not_Running

Thanks,
Tim


-Original Message-
From: David Nolan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, July 13, 2006 2:32 PM
To: Tim Carr; mon@linux.kernel.org
Subject: RE: Problem getting traps to work correctly



--On Thursday, July 13, 2006 14:20:38 -0500 Tim Carr
[EMAIL PROTECTED] 
wrote:

 Gotcha.  I threw that in, and it seems to work correctly, except I
can't
 tell if it is or not.  I'm watching the log file, and it shows alerts
 being sent on an up/down event, but I'm not seeing alerts every 15s
 showing up when things are working correctly.  Is that expected
 behavior?

 Thanks,
 Tim


I refer to the server that sends the traps as a slave server, and the 
server collecting the traps as the master server.  Your master server 
should receive a trap on every status update on the slave server, i.e. a

trap every 15s in your example.  The master should only alert based on
its 
alert behavior.  This makes receving updates via traps almost
functionally 
equivelant to other monitor tests that you run on your master server.

If thats not the behavior you're seeing please let me know.

-David


___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Problem getting traps to work correctly

2006-07-13 Thread David Nolan
On 7/13/06, Tim Carr [EMAIL PROTECTED] wrote:
 Here's a bit more information on it.  I've got the slave server
 configured for multiple services, each of them using the redistribute
 option:

redistribute alert trap.alert mainmonitor


If thats an exact quote you've got the option wrong.  Its just
redistribute trap.alert mainmonitor.

 On the master server, once I've reset it, none of those servers will
 ever go green/good in mon.cgi - they stay in blue/unchecked status.


That sounds like you've still got the period based trap configuration
in place.  (Which would match with the above typo.)

If thats not true, and the line above was a typo in the email not the
configuration, then maybe the redistribute code in CVS is broken.
Before I go investigate that possibility please confirm whether the
line above was an exact quote from your config file.

 In the slave server, the history file shows this for an outage event:

 alert Store13-2 DRBD_Status 1152819579 /opt/mon/alert.d/trap.alert
 (mainmonitor) DRBD_Not_Running
 upalert Store13-2 DRBD_Status 1152819594 /opt/mon/alert.d/trap.alert
 (mainmonitor) DRBD_Not_Running


This also indicates to me that your old alert/upalert configuration is
still in place, because redistribute does not generate history
entries, because doing so would bloat the history file on the slave
server.

-David

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon