Hi. I am controlling monit through a PHP web interface and issue start and stop commands to monit on an arbitrary set of processes.
I am seeing processes which are actually running but become unmonitored. The scripts I use make some effort to run one and only one instance of a process and check that the process ID actually corresponds to the monitored process. In an effort to track down this problem is I think the following may be the root cause... ####################################################### There appears to be a race condition when issuing monit start commands on processes that are dependant on each other. To illustrate... Given processes b1 to b8 are dependant on process a. If I stop all processes and then issue a separate start command for each process, I see in monit.log that the process b1 is being started twice. Once as a result of the dependency on process a, and once because of an explicit start command. I have attached some files to help reproduce this problem. As an aside note that process "a" changes its status in the web interface from "running" to "Changed" unnecessarily? I have made the poll time 60 seconds to illustrate this. ######## [ Jan 27 10:59:55] Monitoring disabled -- service b8 [ Jan 27 10:59:55] 'b8' stop: /root/monit/run.sh [ Jan 27 10:59:56] Monitoring disabled -- service b7 [ Jan 27 10:59:56] 'b7' stop: /root/monit/run.sh [ Jan 27 10:59:57] Monitoring disabled -- service b6 [ Jan 27 10:59:57] 'b6' stop: /root/monit/run.sh [ Jan 27 10:59:59] Monitoring disabled -- service b5 [ Jan 27 10:59:59] 'b5' stop: /root/monit/run.sh [ Jan 27 11:00:00] Monitoring disabled -- service b4 [ Jan 27 11:00:00] 'b4' stop: /root/monit/run.sh [ Jan 27 11:00:01] Monitoring disabled -- service b3 [ Jan 27 11:00:01] 'b3' stop: /root/monit/run.sh [ Jan 27 11:00:02] Monitoring disabled -- service b2 [ Jan 27 11:00:02] 'b2' stop: /root/monit/run.sh [ Jan 27 11:00:03] Monitoring disabled -- service b1 [ Jan 27 11:00:03] 'b1' stop: /root/monit/run.sh [ Jan 27 11:00:04] Monitoring disabled -- service a [ Jan 27 11:00:04] 'a' stop: /root/monit/run.sh [ Jan 27 11:00:05] 'a' start: /root/monit/run.sh [ Jan 27 11:00:05] Monitoring enabled -- service a [ Jan 27 11:00:05] 'b8' start: /root/monit/run.sh [ Jan 27 11:00:05] Monitoring enabled -- service b8 [ Jan 27 11:00:05] 'b7' start: /root/monit/run.sh [ Jan 27 11:00:05] Monitoring enabled -- service b7 [ Jan 27 11:00:05] 'b6' start: /root/monit/run.sh [ Jan 27 11:00:05] Monitoring enabled -- service b6 [ Jan 27 11:00:05] 'b5' start: /root/monit/run.sh [ Jan 27 11:00:05] Monitoring enabled -- service b5 [ Jan 27 11:00:05] 'b4' start: /root/monit/run.sh [ Jan 27 11:00:05] Monitoring enabled -- service b4 [ Jan 27 11:00:05] 'b3' start: /root/monit/run.sh [ Jan 27 11:00:05] Monitoring enabled -- service b3 [ Jan 27 11:00:05] 'b2' start: /root/monit/run.sh [ Jan 27 11:00:05] Monitoring enabled -- service b2 [ Jan 27 11:00:05] 'b1' start: /root/monit/run.sh [ Jan 27 11:00:05] Monitoring enabled -- service b1 [ Jan 27 11:00:05] 'b1' start: /root/monit/run.sh [ Jan 27 11:00:05] monit: Process already running -- process b2 [ Jan 27 11:00:05] monit: Process already running -- process b3 [ Jan 27 11:00:05] monit: Process already running -- process b4 [ Jan 27 11:00:05] monit: Process already running -- process b5 [ Jan 27 11:00:05] monit: Process already running -- process b6 [ Jan 27 11:00:05] monit: Process already running -- process b7 [ Jan 27 11:00:05] monit: Process already running -- process b8 ######## Regards, Peter Holdaway -- Peter Holdaway TechnoCom Corporation Phone: 760 438 5115 ext 132 2030 Corte del Nogal, Suite 200 Fax: 760 438 5815 Carlsbad, CA 92011 Email: [EMAIL PROTECTED] http://www.technocom-wireless.com/
monitrc
Description: Binary data
restart_all.sh
Description: Binary data
run.sh
Description: Binary data
-- To unsubscribe: http://lists.nongnu.org/mailman/listinfo/monit-general
