I will try lowering the interval, but in this case host2 has been down for more than 24 hours without an alert being raised by monit or m/monit.
-- Andrew Fant | Systems Administrator [email protected] | Lei Shi Lab , NIH/NIDA/IRP (443)740-2849 | From: "[email protected]" <[email protected]> Reply-To: This is the general mailing list for monit <[email protected]> Date: Friday, March 8, 2019 at 4:06 PM To: This is the general mailing list for monit <[email protected]> Subject: Re: monit not catching failed ping test The interval between checks is 120 seconds => it can take up to ~2 minutes to detect error with this settings. You can lower the interval to for example 5 seconds for faster error detection. Best regards, Martin On 8 Mar 2019, at 22:00, Fant, Andrew (NIH/NIDA) [E] <[email protected]<mailto:[email protected]>> wrote: In the monitrc file, I have: set daemon 120 As for the monit -vi output, it has 22 remote host checks in total. A shortened, anonymized copy of it is: Adding 'allow localhost' -- host resolved to [::ffff:127.0.0.1] Adding credentials for user 'admin' Runtime constants: Control file = /etc/monitrc Log file = syslog Pid file = /etc/monit/monit.pid Id file = /etc/monit/monit.id<http://monit.id/> State file = /etc/monit/monit.state Debug = True Log = True Use syslog = True Is Daemon = True Use process engine = True Limits = { = programOutput: 512 B = sendExpectBuffer: 256 B = fileContentBuffer: 512 B = httpContentBuffer: 1 MB = networkTimeout: 5 s = programTimeout: 5 m = stopTimeout: 30 s = startTimeout: 30 s = restartTimeout: 30 s = } On reboot = start Poll time = 120 seconds with start delay 0 seconds Event queue = base directory /var/monitor with 1000 slots M/Monit(s) = http://[host1.local]:8080/collector with timeout 5 s with credentials Start monit httpd = True httpd bind address = localhost httpd portnumber = 2812 httpd signature = Enabled httpd auth. style = Basic Authentication and Host/Net allow list The service list contains the following entries: System Name = host1 Monitoring mode = active On reboot = start Remote Host Name = host2_ping Address = 192.168.1.2 Monitoring mode = active On reboot = start Ping = if failed [count 3 size 64 with timeout 5 s] then alert ------------------------------------------------------------------------------- Hopefully this will be of some use. -- Andrew Fant | Systems Administrator [email protected]<mailto:[email protected]> | Lei Shi Lab , NIH/NIDA/IRP (443)740-2849 | From: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Reply-To: This is the general mailing list for monit <[email protected]<mailto:[email protected]>> Date: Friday, March 8, 2019 at 3:26 PM To: This is the general mailing list for monit <[email protected]<mailto:[email protected]>> Subject: Re: monit not catching failed ping test Hello, monit checks the service in intervals given by the "set daemon <x>" settings. If the interval between checks is long or the check is blocked by some service timeout/action, then the interval can be longer. Please can you check the "set daemon" settings and run monit in debug mode?: 1.) stop monit 2.) monit -vI Best regards, Martin On 8 Mar 2019, at 16:49, Fant, Andrew (NIH/NIDA) [E] <[email protected]<mailto:[email protected]>> wrote: Good morning. I have a small monitoring setup with m/monit 3.7.2, using monit 5.25.2 as the agent. There are a couple of systems that I cannot install monit on that I still need to be aware of any downtime, so I have added them as ping checks in the monitrc on the host where I installed m/monit. Yesterday, one of those remote systems went down, but monit and m/monit didn’t report an alert for it and still have its status as OK. Using anonymized information, the entry in the monitrc on host1 is: CHECK HOST host2_ping with ADDRESS 192.168.1.2 IF FAILED ping THEN ALERT And from the command line on host1: host1% monit status host2_ping Monit 5.25.2 uptime: 48d 19h 8m Remote Host 'host2_ping' status OK monitoring status Monitored monitoring mode active on reboot start ping response time - data collected Fri, 08 Mar 2019 10:41:33 But: host1% ping host2 PING host2.example.org<http://host2.example.org/> (192.168.1.2) 56(84) bytes of data. From host1.example.org<http://host1.example.org/> (192.168.1.1) icmp_seq=1 Destination Host Unreachable From host1.example.org<http://host1.example.org/> (192.168.1.1) icmp_seq=2 Destination Host Unreachable From host1.example.org<http://host1.example.org/> (192.168.1.1) icmp_seq=3 Destination Host Unreachable Clearly there is a disconnect between the OS-provided ping utility and what monit is seeing. I’m sure that it’s probably a simple error in configuration, but I am not seeing what I did wrong. Can someone please set me on the correct path? Thank you -- Andrew Fant | Systems Administrator [email protected]<mailto:[email protected]> | Lei Shi Lab , NIH/NIDA/IRP (443)740-2849 | -- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general -- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
