Hi Martin, Were you able to replicate following my steps? Should we fill a bug for this on https://savannah.nongnu.org/bugs/?func=additem&group=monit
Thanks! -Nestor On Wed, Sep 26, 2012 at 9:06 AM, Nestor Urquiza <[email protected]>wrote: > Hi Martin, > > Thanks for the clarification on the "recovery alert" side. > > On the "false positive" side it is probably OK as you described the > current behavior: To interrupt all checks, one by one without checking > first if there is at least one of them in progress until that one check is > to be stopped. As you said monit will wait for that one to complete and > then unmonitor it after. > > However the "false positive" issue still remains and it is is that under > the described circumstances monit will state the remote port is down when > it is not down. > > Were you able to recreate the problem on your side? > > Best regards, > -Nestor > > > On Wed, Sep 26, 2012 at 7:49 AM, Martin Pala <[email protected]>wrote: > >> Hi, >> >> when the monitoring is disabled (unmonitor), or monit is stopped then the >> service's error state is reset. When the monitoring is enabled (or monit >> started) again and the service is running, no recovery alert is send as the >> service monitoring starts from clean state. >> >> The unmonitor is performed at the start of the service check - if by >> coincidence the test of some service is in progress, it allows it to >> complete the test of that single service and doesn't interrupt the pending >> check. When monit goes to next service (in the same cycle) it disables the >> monitoring. I think it's OK to let the pending test complete - it's kind of >> corner case with low impact. >> >> Regards, >> Martin >> >> >> On Sep 24, 2012, at 4:14 PM, Nestor Urquiza <[email protected]> >> wrote: >> >> Hi guys, >> >> Not sure if this is a problem in other OSs as well but I believe I have >> found a bug in monit 5.5 which at least for Solaris 10 is failing to >> synchronize unmonitor actions with ongoing checks. Here is how to recreate >> (tested in two different physical Solaris boxes (Intel) >> >> 1. Configure monit to check every minute. Create several instances like >> the below, checking several external ports and servers: >> >> check host myhost with address myhost >> >> if failed port myport type tcp with timeout 15 seconds >> >> then alert >> >> 2. Issue the below command exactly by the time monit runs (when the clock >> is giving hh:mm:59): >> >> monit unmonitor all >> >> 3. Randomly you get an alert for at least one of the host/port >> combination even though the host/port is actually available. As an example: >> >> >> Action: alert, Description: connection failed, INET[mssql:1433] via TCP >> is not ready for i|o -- Interrupted system call, Service: ptrsvr, Tested >> From Host: myhost >> >> 4. After issuing 'monit monitor all' no alert about the service being >> back up is sent but 'monit status' does show the service is up. >> >> >> IMO monit has a bug where basically it does not synchronize the calls to >> unmonitor and the checks to be performed. If monit receives "unmonitor all" >> it should: (wait for all current checks to finish OR cancel them AND ignore >> any alert messages to be sent). >> >> >> Makes sense? >> >> >> Thanks! >> >> -Nestor >> -- >> To unsubscribe: >> https://lists.nongnu.org/mailman/listinfo/monit-general >> >> >> >> -- >> To unsubscribe: >> https://lists.nongnu.org/mailman/listinfo/monit-general >> > >
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
