Glad to hear it! Glad to help. Also, thanks for pointing out the "every" directive. I think I might want to start using it...
Thanks, Aaron -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Pala Sent: Thursday, December 21, 2006 5:21 AM To: The monit developer list Subject: Re: <service> start Generates email noise Thanks :) I have reproduced the problem - it is fixed in cvs no. It was caused in the validate.c:check_skip() by the order of s->def_every vs. s->visited tests, correct order is: --8<-- if(s->visited) { DEBUG("'%s' check skipped -- service already handled " "in a dependency chain\n", s->name); return TRUE; } if(!s->def_every) return FALSE; --8<-- When there was no 'every' statement used, the check_skip() was FALSE and monit performed the service check in the same cycle where the user-requested start action was called. Because the just-started process was not running yet, the process existence test failed and monit performed the restart action in the same cycle as well. I had problems to reproduce it, since i used 'every' statement in the testing configuration, which masked the bug. Thanks for help :) Martin Aaron Scamehorn wrote: > Hi Martin, > > Thanks for the detailed description... > > I've attached my monitrc file. Obviously the executable (logClient) > is an in-house exe, but that shouldn't matter, should it? > > I wonder if it is due to the amount of time it takes for the exe to > update it's pid file... > > Seems as though it has something to do with the wait_start starting > it's own thread to wait??? > > The scenario is rather simple. I can reproduce by stopping the > service, then issuing a monit <service> start via the CLI. > > If this is not enough detail, or I can help out more, please let me > know. > > Thanks, > Aaron > > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On > Behalf Of Martin Pala > Sent: Wednesday, December 06, 2006 8:50 AM > To: The monit developer list > Subject: Re: <service> start Generates email noise > > I have looked on it ... > > I will first explain how it works in monit 4.8.2: > > Two threads come into play: > - http thread > - monitoring thread > > The http thread process the user requested actions (posted either > using CLI or HTML interface). The action to be done is scheduled in > http/cervlet.c:handle_action() via setting of the s->doaction flag for > the appropriate service. When there is no action scheduled, the > s->doaction flag is set to ACTION_IGNORE (in p.y during service > initialization or in validate.c after it was handled). In addition the > Run.doaction is set to TRUE just to signalize that there is some > scheduled action in the service tree. The main monitoring thread is > then > > wake up by http thread to speedup the action handling. > > The main thread then in validate.c:validate() checks whether the > Run.doaction flag is set, since the user actions are preferred. In the > case that it is set, it walks the service tree and for each service > performs the scheduled s->doaction using control_service() and then > resets the s->doaction flag to ACTION_IGNORE. This is all done under > mutex and signal protection, so it cannot be interrupted nor race > condition can occure. The only thread which can call control_service > and > > physicaly start/restart/etc. the service is the main thread. The > control_service also sets the s->visited flag. > > The second service loop is then evaluated - monit walks the service > tree, for each service locks mutex and blocks signals. In the case > that the service was not handled in the same cycle already (s->visited > flag is compared in the check_skip) it checks the s->doaction flag > again (to improve the response time for the services, which has > scheduled action in between the first and second loop in the same > cycle). In the case that it is set, it performs the action, otherwise it checks the service. > > > The design is similar to signal handling. The http thread just sets > the flag, whereas the monitoring thread handle the action. From theory > point > > of view, i think no race condition could occure. > > I tried to reproduce the problem (official monit-4.8.2 release) > without success. > > Can you prepare simple monit configuration and procedure for problem > reproduction? > > Thanks, > Martin > > > > Aaron Scamehorn wrote: > >>Hi Martin, >> >>Actually I think you've now got one thread doing an ACTION_START, and >>another doing an ACTION_RESTART on the exact same service. >> >>It is the ACTION_RESTART that is generating what I perceived to be >>extraneous emails. >> >>It looks like the do_wakeupcall that you added to >>http/cervlet.c:handle_action() is the culprit. Without it, I don't > > get > >>the ACTION_RESTART problem. >> >>Of course you need this now, or else it takes Poll Time to actully >>respond to the HTTP events, which is what you were trying to speed up > > in > >>the first place. >> >>Here is the log output, with a bunch of extra messages, including >>pthread_t. >> >>3086927552 [CST Dec 1 14:53:25] debug : 'data_dir' filesystem > > flags > >>has not changed since last cycle >>3086927552 [CST Dec 1 14:53:25] debug : 'data_dir' space usage > > check > >>passed [current space usage=10.6%] >>3086924720 [CST Dec 1 14:53:26] info : monit daemon at 24175 >>awakened >>3086927552 [CST Dec 1 14:53:26] info : Awakened by User defined >>signal 1 >>3086927552 [CST Dec 1 14:53:26] debug : control_service: >>ACTION_START for 'LogClient' >>3086927552 [CST Dec 1 14:53:26] debug : control_service: >>ACTION_START Util_isProcessRunning for 'LogClient' >>3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing >>process id [24220] -- No such process >>3086927552 [CST Dec 1 14:53:26] debug : do_start: >>Util_isProcessRunning for 'LogClient' >>3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing >>process id [24220] -- No such process >>3086927552 [CST Dec 1 14:53:26] info : 'LogClient' start: >>/cogcap/ccts/bin/logclnt >>3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing >>process id [24220] -- No such process >>3086927552 [CST Dec 1 14:53:26] debug : Monitoring enabled -- >>service LogClient >>3086927552 [CST Dec 1 14:53:26] debug : check_process: calling >>Util_isProcessRunning for 'LogClient' >>3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing >>process id [24220] -- No such process >>3086927552 [CST Dec 1 14:53:26] error : 'LogClient' process is not >>running >>3086927552 [CST Dec 1 14:53:26] debug : Does not exist > > notification > >>is NOT sent to [EMAIL PROTECTED] >>3086927552 [CST Dec 1 14:53:26] debug : Does not exist > > notification > >>is sent to [EMAIL PROTECTED] >>3076434864 [CST Dec 1 14:53:26] debug : static void* wait_start > > for > >>'LogClient' >>3076434864 [CST Dec 1 14:53:26] debug : 1) wait_start: calling >>Util_isProcessRunning for 'LogClient', max_tries= 29 >>3076434864 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing >>process id [24220] -- No such process >>3086927552 [CST Dec 1 14:53:26] debug : control_service: >>ACTION_RESTART for 'LogClient' >>3086927552 [CST Dec 1 14:53:26] info : 'LogClient' trying to >>restart >>3086927552 [CST Dec 1 14:53:26] debug : Monitoring disabled -- >>service LogClient (stop) >>3086927552 [CST Dec 1 14:53:26] debug : do_stop: >>Util_isProcessRunning for 'LogClient' >>3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing >>process id [24220] -- No such process >>3086927552 [CST Dec 1 14:53:26] debug : 'data_dir' filesystem > > flags > >>has not changed since last cycle >>3086927552 [CST Dec 1 14:53:26] debug : 'data_dir' space usage > > check > >>passed [current space usage=10.6%] >>3076434864 [CST Dec 1 14:53:27] debug : 1) wait_start: calling >>Util_isProcessRunning for 'LogClient', max_tries= 28 >>3076434864 [CST Dec 1 14:53:27] debug : 2) wait_start: calling >>Util_isProcessRunning for 'LogClient' >>3086927552 [CST Dec 1 14:53:56] debug : check_process: calling >>Util_isProcessRunning for 'LogClient' >>3086927552 [CST Dec 1 14:53:56] info : 'LogClient' process is >>running with pid 24375 >>3086927552 [CST Dec 1 14:53:56] debug : Exists notification is NOT >>sent to [EMAIL PROTECTED] >>3086927552 [CST Dec 1 14:53:56] debug : Exists notification is > > sent > >>to [EMAIL PROTECTED] >>3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' zombie check >>passed [status_flag=0000] >>3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' loadavg(5min) >>check passed [current loadavg(5min)=0.2] >>3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' cpu usage > > check > >>passed [current cpu usage=0.0%] >>3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' mem amount > > check > >>passed [current mem amount=2764kB] >>3086927552 [CST Dec 1 14:53:56] debug : 'data_dir' filesystem > > flags > >>has not changed since last cycle >>3086927552 [CST Dec 1 14:53:56] debug : 'data_dir' space usage > > check > >>passed [current space usage=10.6%] >> >> >>-----Original Message----- >>From: [EMAIL PROTECTED] >>[mailto:[EMAIL PROTECTED] On >>Behalf Of Martin Pala >>Sent: Thursday, November 30, 2006 4:20 PM >>To: The monit developer list >>Subject: Re: <service> start Generates email noise >> >>Hello, >> >>this behavior isn't bug - the 'nonexist' event type has possitive and >>negative variants: >> >> Does not exist (positive 'nonexist') >> >> vs. >> >> Exists (negative 'nonexist') >> >>The alert statement allows to filter just the general event type, not >>the particular polarity (there is no 'exist' option). >> >>=> when you have registered the 'nonexist' event, you should get two >>alerts informing about the beggining and end of the problem. >> >>Martin >> >> >>Aaron Scamehorn wrote: >> >>>Hello, >>> >>> From version 4.8 to 4.8.2, the following bug has been introduced: >>> >>>When we issue a monit <service> start command, we get "Does not > > exist" > >>>and a corresponding "Exists" emails. >>> >>>Here is the debug output showing this behavior in 4.8.2: >>>'LogClient' Error testing process id [11034] -- No such process >>>'LogClient' Error testing process id [11034] -- No such process >>>'LogClient' start: /cogcap/ccts/bin/logclnt 'LogClient' Error testing >>>process id [11034] -- No such process Monitoring enabled -- service >>>LogClient 'LogClient' Error testing process id [11034] -- No such >>>process 'LogClient' process is not running Does not exist >>>notification is sent to [EMAIL PROTECTED] 'LogClient' Error >>>testing process id [11034] -- No such process 'LogClient' trying to >>>restart Monitoring disabled -- service LogClient (stop) 'LogClient' >>>Error testing process id [11034] -- No such process 'LogClient' >>>process is running with pid 11189 Exists notification is sent to >>>[EMAIL PROTECTED] 'LogClient' zombie check passed >>>[status_flag=0000] 'LogClient' loadavg(5min) check passed [current >>>loadavg(5min)=0.2] 'LogClient' cpu usage check passed [current cpu >>>usage=0.0%] 'LogClient' mem amount check passed [current mem >>>amount=2776kB] >>> >>> >>>Under version 4.8, we don't get the annoying "Does not exist" and a >>>corresponding "Exists" emails: >>> >>>'LogClient' Error testing process id [10970] -- No such process >>>'LogClient' Error testing process id [10970] -- No such process >>>'LogClient' start: /cogcap/ccts/bin/logclnt 'LogClient' Error testing >>>process id [10970] -- No such process Monitoring enabled -- service >>>LogClient 'LogClient' Error testing process id [10970] -- No such >>>process 'LogClient' Error testing process id [10970] -- No such >>>process 'LogClient' zombie check passed [status_flag=0000] >>>'LogClient' loadavg(5min) check passed [current loadavg(5min)=0.1] >>>'LogClient' cpu usage check passed [current cpu usage=0.0%] >>>'LogClient' mem amount check passed [current mem amount=2776kB] >>> >>> >>> >>>Additionally, in our config file, we have the following set: >>>set alert [EMAIL PROTECTED] only on { nonexist, exec, connection > > } > >>>We shouldn't be getting an "Exists" email under any circumstance, >> >>should >> >>>we? >>> >>>Thanks, >>>Aaron >>> >>> >>> >> > ---------------------------------------------------------------------- > -- > >>>_______________________________________________ >>>monit-dev mailing list >>>[email protected] >>>http://lists.nongnu.org/mailman/listinfo/monit-dev >> >> >>_______________________________________________ >>monit-dev mailing list >>[email protected] >>http://lists.nongnu.org/mailman/listinfo/monit-dev >> >> >>_______________________________________________ >>monit-dev mailing list >>[email protected] >>http://lists.nongnu.org/mailman/listinfo/monit-dev > > > > _______________________________________________ > monit-dev mailing list > [email protected] > http://lists.nongnu.org/mailman/listinfo/monit-dev > > > ---------------------------------------------------------------------- > -- > > _______________________________________________ > monit-dev mailing list > [email protected] > http://lists.nongnu.org/mailman/listinfo/monit-dev _______________________________________________ monit-dev mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/monit-dev _______________________________________________ monit-dev mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/monit-dev
