I'm stumped. I have an ugly little script to alert me if today's backup of a database is smaller than the one from yesterday (and the day before). The script works properly, and I have a simple monit rule in place to alert me if it fails. When monit checks, it reports a failure; that is pushed up to my m/monit server, which also logs the failure. From there, all alerts go to PagerDuty. But I never get alerts from this check.

(Hopefully) all relevant output is below. Some strings have been obfuscated. Note that I have the rule modified to falsely report a failure, for testing.

root@db1-primary: /etc/monit/conf.d # cat /etc/debian_version
7.9

root@db1-primary: /etc/monit/conf.d # monit --version
This is Monit version 5.17
Built with ssl, without pam and with large files
Copyright (C) 2001-2016 Tildeslash Ltd. All Rights Reserved.

root@db1-primary: /etc/monit/conf.d # cat backups
check program backup_failure with path /usr/local/bin/check_backup with timeout 15 seconds
not every "* 14 * * *"
#if status != 0 then alert
if status != 1 then alert

root@db1-primary: /etc/monit/conf.d # cat /usr/local/bin/check_backup
#!/bin/bash
BACKUP_DIR=/var/backups
cd ${BACKUP_DIR}
BUFILE=`date +%Y_%m_%d`_"group".sql.gz
YDAY_BUFILE=`date --date "1 days ago" +%Y_%m_%d`_"group".sql.gz
DAYBEFORE_YDAY_BUFILE=`date --date "2 days ago" +%Y_%m_%d`_"group".sql.gz
if [ -e "${BUFILE}" ];then
    TDAYSIZE=`du ${BUFILE}|cut -f1`
    YDAYSIZE=`du ${YDAY_BUFILE}|cut -f1`
    DBDAYSIZE=`du ${DAYBEFORE_YDAY_BUFILE}|cut -f1`
    if [ $YDAYSIZE -gt $DBDAYSIZE ];then
    if [ $TDAYSIZE -gt $YDAYSIZE ];then
        exit 0
    fi
    else
        exit 1
    fi
fi

root@db1-primary:/etc/monit/conf.d #  tail -1 /var/log/daemon.log
Mar 11 15:25:04 localhost monit[10562]: 'backup_failure' '/usr/local/bin/check_backup' failed with exit status (0) -- no output

root@db1-primary: ~ # monit status|tail -7
Program 'backup_failure'
  status                            Status failed
  monitoring status                 Monitored
  last started                      Fri, 11 Mar 2016 15:42:36
  last exit value                   0
  data collected                    Fri, 11 Mar 2016 15:42:36

What am I missing?

--
Paul Theodoropoulos
www.anastrophe.com

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to