Re: Monit shows "statistic error"

Lutz Mader Sat, 21 Nov 2020 00:41:13 -0800

Hello Ani,
I checked some of my logs and find a similar problem all the time the
workload is very high (on a AIX system).


[MESZ May  8 05:29:14] error    : 'D100SPUABC00' mem usage of 95.5%
matches resource limit [mem usage > 95.0%]
[MESZ May  8 05:31:14] error    : 'Manager' failed to get process data

>> I am running Monit 5.17.1 on Ubuntu 14.04, in some rare occasions
>> I see that following error in the log:
>>
>> 2020-11-17 18:47:22.347 monit[2954]: system statistic error -- cannot
>> read /proc/3560/stat

As long as this is a workload problem you can configure Monit to delay a
restart. With a additinal "not exist" rule

  if not exist for 5 cycles then start

in the "check process" service, Monit will start/restart the service
after 5 checks only. If Monit can not get the process data only once,
nothing will happen (I append a sample).

A suggestion only,
Lutz

Appendage:
A sample of one of the used service definitions:

check process Serv_server1 with pidfile
"/usr/local/var/wlp/servers/.pid/server1.pid"
  start program "/usr/local/etc/monit/scripts/wlpserv.sh start" with
timeout 180 seconds
  stop program "/usr/local/etc/monit/scripts/wlpserv.sh stop" with
timeout 120 seconds
  restart program "/usr/local/etc/monit/scripts/wlpserv.sh restart" with
timeout 300 seconds
#  if failed host hostname.local port 8901 then alert
#  if failed host hostname.local port 9901 then alert
  if not exist for 5 cycles then start
  if 5 restarts within 50 cycles then unmonitor

The "not exist" rule delays the start to five checks and the "restart"
rule prevent endless recovery.

Re: Monit shows "statistic error"

Reply via email to