Hi,
I have a bunch of Monit rules to perform check on a service
1. One check process rule (existence and port checks)
1. does not exist for 5 cycles then start
2. failed port XXXX for 6 times within 8 cycles then restart
3. failed port YYYY for 6 times within 8 cycles then restart
4. failed port ZZZZ for 6 times within 8 cycles then restart
2. Three check program rules with custom checks
1. if status != 0 for 5 times within 10 cycles then restart
2. if status != 0 for 5 times within 10 cycles then restart
3. if status != 0 for 5 times within 10 cycles then restart
3. One to check log content
1. check file + if content = "BIG ERROR" then restart
start/stop rules are
start program = "/bin/systemctl start myservice"
stop program = "/bin/systemctl stop myservice"
There are no dependency at Monit level but checks are part of the same
bunch of groups.
Problem, is that due to multiple issues, I got a "restart" storm as
1. some port check failed -> restart issued
2. lead to error at custom script -> restart issued
3. content log reading has some lags -> restart issued
Myservice or system.d configuration/feature are not well designed so got
"already bind exception" as system.d tried to start several instance at the
same time🤔
So port check failed again, system.d killed the wrong one, MyService was
blocked, restart again. etc.....
I had to shutdown Monit to prevent further action (I could have monit -g
group unmonitor also), kill every instance of my service, start it
correctly, then reactivate Monit
Question:
- Is there a native way to prevent Monit to issue the same start/stop
commands in a defined time-frame ?
- Does Monit dependency feature between checks could help as I don't see
how it could help ?
- Any other hint/proposal (aside increasing the values of "for N times
within T cycles" to delay the restart)
Remark: maybe exploring system.D features StartLimitIntervalSe &
StartLimitBurst could help.
Best Regards.
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general