RE: Problem getting traps to work correctly
That did the trick. Thanks for your help. Thanks, Tim -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Nolan Sent: Thursday, July 13, 2006 5:03 PM To: Tim Carr Cc: mon@linux.kernel.org Subject: Re: Problem getting traps to work correctly On 7/13/06, Tim Carr [EMAIL PROTECTED] wrote: Here's a bit more information on it. I've got the slave server configured for multiple services, each of them using the redistribute option: redistribute alert trap.alert mainmonitor If thats an exact quote you've got the option wrong. Its just redistribute trap.alert mainmonitor. On the master server, once I've reset it, none of those servers will ever go green/good in mon.cgi - they stay in blue/unchecked status. That sounds like you've still got the period based trap configuration in place. (Which would match with the above typo.) If thats not true, and the line above was a typo in the email not the configuration, then maybe the redistribute code in CVS is broken. Before I go investigate that possibility please confirm whether the line above was an exact quote from your config file. In the slave server, the history file shows this for an outage event: alert Store13-2 DRBD_Status 1152819579 /opt/mon/alert.d/trap.alert (mainmonitor) DRBD_Not_Running upalert Store13-2 DRBD_Status 1152819594 /opt/mon/alert.d/trap.alert (mainmonitor) DRBD_Not_Running This also indicates to me that your old alert/upalert configuration is still in place, because redistribute does not generate history entries, because doing so would bloat the history file on the slave server. -David ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon
RE: Problem getting traps to work correctly
--On Thursday, July 13, 2006 14:01:58 -0500 Tim Carr [EMAIL PROTECTED] wrote: A question on the redistribute option, though - I'm not sure I can follow how the configuration works. For example, my current remote server config is: redistribute is a service level config option, not a period option. For example: watch Store13-2 service DRBD_Status interval 15s monitor DRBDCheck.monitor -s you description Is\ DRBD\ working\ there? redistribute trap.alert mainmonitor -David ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon
RE: Problem getting traps to work correctly
Gotcha. I threw that in, and it seems to work correctly, except I can't tell if it is or not. I'm watching the log file, and it shows alerts being sent on an up/down event, but I'm not seeing alerts every 15s showing up when things are working correctly. Is that expected behavior? Thanks, Tim -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Nolan Sent: Thursday, July 13, 2006 7:06 AM To: mon@linux.kernel.org Subject: Re: Problem getting traps to work correctly --On Wednesday, July 12, 2006 16:30:08 -0500 Tim Carr [EMAIL PROTECTED] wrote: When mainmonitor gets one of the traps, I'll see this in /var/log/messages: Jul 11 16:20:04 monitor mon[2017]: trap received for undefined service type default/DRBD_Status ...but nothing will actually get kicked off and no mail is sent. Also, the mon.cgi program (running on mainmonitor) will stay in the blue/unchecked status. Looks like you've found a logic bug. The code to set the group service in handle_trap to default/default has an error which causes it to set the group but never the service. I just commited a fix for this to CVS. Jul 11 16:24:49 monitor mon[2017]: trap trap 0 from grp=default svc=DRBD_Status, sta=0 In this case you're not getting an alert because the status bit of the trap is set to 0, which is the OK status. It looks like the remote.alert in CVS was never updated when Mon starting using that field. trap.alert was rewritten... Since these two alerts server the same purpose I'm going to remove remote.alert from CVS. Any thoughts as to what's going on here? I'm trying to get this working: -An alert getting kicked off by the mainmonitor's system when it receives a trap; and -The mon.cgi program on mainmonitor showing an alert status once its received that trap. BTW, you might want to use the 'redistribute' config parameter for your traps, that will cause all status updates to propagate to your main mon server. That way you can see when the last test occurred at all times. From the current Mon manpage (in CVS): redistribute alert [arg...] A service may have one redistribute option, which is a special form of an an alert definition. This alert will be called on every service status update, even sequential success status updates. This can be used to integrate Mon with another monitoring system, or to link together multiple Mon servers via an alert script that generates Mon traps. -David ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon
RE: Problem getting traps to work correctly
--On Thursday, July 13, 2006 14:20:38 -0500 Tim Carr [EMAIL PROTECTED] wrote: Gotcha. I threw that in, and it seems to work correctly, except I can't tell if it is or not. I'm watching the log file, and it shows alerts being sent on an up/down event, but I'm not seeing alerts every 15s showing up when things are working correctly. Is that expected behavior? Thanks, Tim I refer to the server that sends the traps as a slave server, and the server collecting the traps as the master server. Your master server should receive a trap on every status update on the slave server, i.e. a trap every 15s in your example. The master should only alert based on its alert behavior. This makes receving updates via traps almost functionally equivelant to other monitor tests that you run on your master server. If thats not the behavior you're seeing please let me know. -David ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon
RE: Problem getting traps to work correctly
Here's a bit more information on it. I've got the slave server configured for multiple services, each of them using the redistribute option: redistribute alert trap.alert mainmonitor On the master server, once I've reset it, none of those servers will ever go green/good in mon.cgi - they stay in blue/unchecked status. If I force a failure on one of those services, the master server will show that service going red. Once I re-enable the service, it will then show green. But the other services for that slave server will never change from unchecked state on the master server's mon.cgi. Also, if I then re-set the mon process on the master server, all items will go back to blue and will not change again unless I force a failure. Also, I'm logging all output to a file on both the master and slave server via these commands: logdir = /var/log/mon historicfile = /var/log/mon/history In the slave server, the history file shows this for an outage event: alert Store13-2 DRBD_Status 1152819579 /opt/mon/alert.d/trap.alert (mainmonitor) DRBD_Not_Running upalert Store13-2 DRBD_Status 1152819594 /opt/mon/alert.d/trap.alert (mainmonitor) DRBD_Not_Running On the master server, it will only log this for that same event: trapalert Store13-2 DRBD_Status 1152819578 /opt/mon/alert.d/mail.alert ([EMAIL PROTECTED]) DRBD_Not_Running Thanks, Tim -Original Message- From: David Nolan [mailto:[EMAIL PROTECTED] Sent: Thursday, July 13, 2006 2:32 PM To: Tim Carr; mon@linux.kernel.org Subject: RE: Problem getting traps to work correctly --On Thursday, July 13, 2006 14:20:38 -0500 Tim Carr [EMAIL PROTECTED] wrote: Gotcha. I threw that in, and it seems to work correctly, except I can't tell if it is or not. I'm watching the log file, and it shows alerts being sent on an up/down event, but I'm not seeing alerts every 15s showing up when things are working correctly. Is that expected behavior? Thanks, Tim I refer to the server that sends the traps as a slave server, and the server collecting the traps as the master server. Your master server should receive a trap on every status update on the slave server, i.e. a trap every 15s in your example. The master should only alert based on its alert behavior. This makes receving updates via traps almost functionally equivelant to other monitor tests that you run on your master server. If thats not the behavior you're seeing please let me know. -David ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon
Re: Problem getting traps to work correctly
On 7/13/06, Tim Carr [EMAIL PROTECTED] wrote: Here's a bit more information on it. I've got the slave server configured for multiple services, each of them using the redistribute option: redistribute alert trap.alert mainmonitor If thats an exact quote you've got the option wrong. Its just redistribute trap.alert mainmonitor. On the master server, once I've reset it, none of those servers will ever go green/good in mon.cgi - they stay in blue/unchecked status. That sounds like you've still got the period based trap configuration in place. (Which would match with the above typo.) If thats not true, and the line above was a typo in the email not the configuration, then maybe the redistribute code in CVS is broken. Before I go investigate that possibility please confirm whether the line above was an exact quote from your config file. In the slave server, the history file shows this for an outage event: alert Store13-2 DRBD_Status 1152819579 /opt/mon/alert.d/trap.alert (mainmonitor) DRBD_Not_Running upalert Store13-2 DRBD_Status 1152819594 /opt/mon/alert.d/trap.alert (mainmonitor) DRBD_Not_Running This also indicates to me that your old alert/upalert configuration is still in place, because redistribute does not generate history entries, because doing so would bloat the history file on the slave server. -David ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon