Hello Ani, I checked some of my logs and find a similar problem all the time the workload is very high (on a AIX system).
[MESZ May 8 05:29:14] error : 'D100SPUABC00' mem usage of 95.5% matches resource limit [mem usage > 95.0%] [MESZ May 8 05:31:14] error : 'Manager' failed to get process data >> I am running Monit 5.17.1 on Ubuntu 14.04, in some rare occasions >> I see that following error in the log: >> >> 2020-11-17 18:47:22.347 monit[2954]: system statistic error -- cannot >> read /proc/3560/stat As long as this is a workload problem you can configure Monit to delay a restart. With a additinal "not exist" rule if not exist for 5 cycles then start in the "check process" service, Monit will start/restart the service after 5 checks only. If Monit can not get the process data only once, nothing will happen (I append a sample). A suggestion only, Lutz Appendage: A sample of one of the used service definitions: check process Serv_server1 with pidfile "/usr/local/var/wlp/servers/.pid/server1.pid" start program "/usr/local/etc/monit/scripts/wlpserv.sh start" with timeout 180 seconds stop program "/usr/local/etc/monit/scripts/wlpserv.sh stop" with timeout 120 seconds restart program "/usr/local/etc/monit/scripts/wlpserv.sh restart" with timeout 300 seconds # if failed host hostname.local port 8901 then alert # if failed host hostname.local port 9901 then alert if not exist for 5 cycles then start if 5 restarts within 50 cycles then unmonitor The "not exist" rule delays the start to five checks and the "restart" rule prevent endless recovery.
