(Accidentally sent this only to Martin)
It does show up in the Events page-
* Date Mar 12 2016 00:13:22
* Host db1-primary
<https://netman-ng.autonetmobile.net:8443/status/hosts/detail?id=2320712>
* Service name backup_failure
* Service type Program
* Event Status failed
* Action Alert
Message
* '/usr/local/bin/check_backup' failed with exit status (0) -- no output
And the alert rule it falls under (can't select the text) -
hostgroup production
any service
failed
any event
then perform the following actions
execute program
(script for pagerduty)
This is the same ruleset used for all of the production services I
monitor - and they all work except this.
On 3/12/2016 9:46 AM, Martin Pala wrote:
Please check if M/Monit's "Report -> Events" page if it contains the
related event. If it does, then the the event was sent to M/Monit and
was processed by M/Monit's Rule manager, but didn't match any rule ...
please check the "Admin -> Alerts" page in such case.
On 12 Mar 2016, at 18:40, Paul Theodoropoulos <[email protected]
<mailto:[email protected]>> wrote:
They are managed in m/monit -
root@db1-primary: ~ # cat /etc/monit/monitrc
set daemon 300
with start delay 10
set logfile syslog facility log_daemon
set idfile /var/.monit.id
set statefile /var/.monit.state
set mailserver localhost
include /etc/monit/conf.d/*
set eventqueue basedir /var/spool/monit slots 1000
set mmonit https://monit:58Mnz22*jyNSO$Q&@fake.example.net:8443/collector
set httpd port 2812
allow localhost
allow fake.example.net <http://fake.example.net>
allow monit:XXXXXXXX
use address 10.124.74.115
allow 10.124.74.115
On 3/12/2016 8:38 AM, Martin Pala wrote:
Are the alerts on your system managed on Monit side or in M/Monit?
Best regards,
Martin
On 12 Mar 2016, at 01:01, Paul Theodoropoulos <[email protected]
<mailto:[email protected]>> wrote:
I'm stumped. I have an ugly little script to alert me if today's
backup of a database is smaller than the one from yesterday (and
the day before). The script works properly, and I have a simple
monit rule in place to alert me if it fails. When monit checks, it
reports a failure; that is pushed up to my m/monit server, which
also logs the failure. From there, all alerts go to PagerDuty. But
I never get alerts from this check.
(Hopefully) all relevant output is below. Some strings have been
obfuscated. Note that I have the rule modified to falsely report a
failure, for testing.
root@db1-primary: /etc/monit/conf.d # cat /etc/debian_version
7.9
root@db1-primary: /etc/monit/conf.d # monit --version
This is Monit version 5.17
Built with ssl, without pam and with large files
Copyright (C) 2001-2016 Tildeslash Ltd. All Rights Reserved.
root@db1-primary: /etc/monit/conf.d # cat backups
check program backup_failure with path /usr/local/bin/check_backup
with timeout 15 seconds
not every "* 14 * * *"
#if status != 0 then alert
if status != 1 then alert
root@db1-primary: /etc/monit/conf.d # cat /usr/local/bin/check_backup
#!/bin/bash
BACKUP_DIR=/var/backups
cd ${BACKUP_DIR}
BUFILE=`date +%Y_%m_%d`_"group".sql.gz
YDAY_BUFILE=`date --date "1 days ago" +%Y_%m_%d`_"group".sql.gz
DAYBEFORE_YDAY_BUFILE=`date --date "2 days ago"
+%Y_%m_%d`_"group".sql.gz
if [ -e "${BUFILE}" ];then
TDAYSIZE=`du ${BUFILE}|cut -f1`
YDAYSIZE=`du ${YDAY_BUFILE}|cut -f1`
DBDAYSIZE=`du ${DAYBEFORE_YDAY_BUFILE}|cut -f1`
if [ $YDAYSIZE -gt $DBDAYSIZE ];then
if [ $TDAYSIZE -gt $YDAYSIZE ];then
exit 0
fi
else
exit 1
fi
fi
root@db1-primary:/etc/monit/conf.d # tail -1 /var/log/daemon.log
Mar 11 15:25:04 localhost monit[10562]: 'backup_failure'
'/usr/local/bin/check_backup' failed with exit status (0) -- no output
root@db1-primary: ~ # monit status|tail -7
Program 'backup_failure'
status Status failed
monitoring status Monitored
last started Fri, 11 Mar 2016 15:42:36
last exit value 0
data collected Fri, 11 Mar 2016 15:42:36
What am I missing?
--
Paul Theodoropoulos
www.anastrophe.com
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general
--
Paul Theodoropoulos
www.anastrophe.com
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general
--
Paul Theodoropoulos
www.anastrophe.com
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general