Just to report that this happens also when monit is monitoring back, for example:
[EDT Sep 24 15:18:34] info : monit daemon with PID 17391 awakened [EDT Sep 24 15:18:34] info : 'server1' monitor action done [EDT Sep 24 15:18:34] info : Awakened by User defined signal 1 [EDT Sep 24 15:18:34] info : 'server2' monitor on user request [EDT Sep 24 15:18:34] info : monit daemon with PID 17391 awakened [EDT Sep 24 15:18:34] info : 'server2' monitor action done [EDT Sep 24 15:18:34] info : 'server3' monitor on user request [EDT Sep 24 15:18:34] info : monit daemon with PID 17391 awakened [EDT Sep 24 15:18:34] error : 'server1' connection failed, INET[server1:80] via TCP is not ready for i|o -- I nterrupted system call [EDT Sep 24 15:18:34] info : 'server6' monitor on user request [EDT Sep 24 15:18:34] info : monit daemon with PID 17391 awakened [EDT Sep 24 15:18:34] info : 'server4' monitor on user request [EDT Sep 24 15:18:34] info : monit daemon with PID 17391 awakened [EDT Sep 24 15:18:34] info : 'server5' monitor on user request [EDT Sep 24 15:18:34] info : monit daemon with PID 17391 awakened [EDT Sep 24 15:18:35] info : 'server3' monitor action done [EDT Sep 24 15:18:35] info : 'server6' monitor action done [EDT Sep 24 15:18:35] info : 'server4' monitor action done [EDT Sep 24 15:18:35] info : 'server5' monitor action done [EDT Sep 24 15:18:35] info : Awakened by User defined signal 1 [EDT Sep 24 15:18:35] info : 'server1' connection succeeded to INET[server1:80] via TCP On Mon, Sep 24, 2012 at 10:14 AM, Nestor Urquiza <[email protected]>wrote: > Hi guys, > > Not sure if this is a problem in other OSs as well but I believe I have > found a bug in monit 5.5 which at least for Solaris 10 is failing to > synchronize unmonitor actions with ongoing checks. Here is how to recreate > (tested in two different physical Solaris boxes (Intel) > > 1. Configure monit to check every minute. Create several instances like > the below, checking several external ports and servers: > > check host myhost with address myhost > > if failed port myport type tcp with timeout 15 seconds > > then alert > > 2. Issue the below command exactly by the time monit runs (when the clock > is giving hh:mm:59): > > monit unmonitor all > > 3. Randomly you get an alert for at least one of the host/port combination > even though the host/port is actually available. As an example: > > Action: alert, Description: connection failed, INET[mssql:1433] via TCP is > not ready for i|o -- Interrupted system call, Service: ptrsvr, Tested From > Host: myhost > > 4. After issuing 'monit monitor all' no alert about the service being back > up is sent but 'monit status' does show the service is up. > > > IMO monit has a bug where basically it does not synchronize the calls to > unmonitor and the checks to be performed. If monit receives "unmonitor all" > it should: (wait for all current checks to finish OR cancel them AND ignore > any alert messages to be sent). > > > Makes sense? > > > Thanks! > > -Nestor >
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
