Currently the default start/stop/restart program limit is hardcoded and can be overridden only locally using the "timeout" option as you noted.
Switching default timeout to cycle length won't work - many users have poll cycle like 2 minutes. I think the best way will be to extend the "set limits" statement (https://mmonit.com/monit/documentation/monit.html#LIMITS) with new option to set start/stop/restart program timeouts ... you can create enhancement request via bitbucket: https://bitbucket.org/tildeslash/monit/issues/new Best regards, Martin > On 09 Aug 2016, at 21:43, Geoff Goas <[email protected]> wrote: > > Some more information: In monit 4.x, the start/stop program timeout defaulted > to 1 cycle, though this was not documented in the man page. Starting in 5.0, > it is no longer possible to specify this timeout in cycles, only seconds (per > the man page). > > From the changelog: > > * It is now possible to define execution timeout for start and > stop commands. That is, how long Monit will wait after > executing a command before it assume execution failed. If the > timeout option is omitted, Monit defaults to 30 seconds. You > can override the timeout for example for services which are > starting slower. > Example syntax: > start program = "/bin/foo start" with timeout 60 seconds > > I tried to specify "1 cycle" for the start/stop program entries, and I get a > syntax error as expected. > > Would the devs entertain the notion of adding the ability to specify cycles, > or the ability to set the timeout at the global level? > > With the following patch I can make the default timeout to match up with the > daemon interval: > > --- src/p.y.orig 2016-08-09 14:23:02.000000000 -0400 > +++ src/p.y 2016-08-09 14:23:48.000000000 -0400 > @@ -1550,7 +1550,7 @@ > ; > > exectimeout : /* EMPTY */ { > - $<number>$ = EXEC_TIMEOUT; > + $<number>$ = Run.polltime; > } > | TIMEOUT NUMBER SECOND { > $<number>$ = $2; > --- src/y.tab.c.orig 2016-08-09 14:54:25.000000000 -0400 > +++ src/y.tab.c 2016-08-09 14:54:34.000000000 -0400 > @@ -4576,7 +4576,7 @@ > case 418: > #line 1552 "src/p.y" /* yacc.c:1646 */ > { > - (yyval.number) = EXEC_TIMEOUT; > + (yyval.number) = Run.polltime; > } > #line 4581 "src/y.tab.c" /* yacc.c:1646 */ > break; > > > On Mon, Aug 8, 2016 at 3:26 PM, Geoff Goas <[email protected] > <mailto:[email protected]>> wrote: > It looks like I can put "timeout X seconds" after the start program / stop > program lines to control this interval. Is there a way to set it globally? > > On Fri, Aug 5, 2016 at 1:12 PM, Geoff Goas <[email protected] > <mailto:[email protected]>> wrote: > I think I have found the issue, and I think I may have to walk back my > statements on this affecting a certain version or distro. > > I have 8 monitored services in an "Execution failed" state. None of the > services have a timeout defined. > > The timeout apparently defaults to EXEC_TIMEOUT (30 seconds). monit waits the > full 30 seconds for the service check to finally fail before checking the > next service that is also in an "Execution failed" state. > > [EDT Aug 5 13:03:58] error : 'service_name' process is not running > [EDT Aug 5 13:03:58] info : 'service_name' trying to restart > [EDT Aug 5 13:03:58] info : 'service_name' start: > /etc/init.d/service_name > [EDT Aug 5 13:03:58] info : Sleeping for 100 ms (src/control.c:127) > [EDT Aug 5 13:03:58] info : Sleeping for 100 ms (src/control.c:127) > [EDT Aug 5 13:03:58] info : Sleeping for 50000 ms (src/control.c:159) > [EDT Aug 5 13:03:58] info : Sleeping for 100000 ms (src/control.c:159) > [EDT Aug 5 13:03:58] info : Sleeping for 200000 ms (src/control.c:159) > [EDT Aug 5 13:03:58] info : Sleeping for 400000 ms (src/control.c:159) > [EDT Aug 5 13:03:59] info : Sleeping for 800000 ms (src/control.c:159) > [EDT Aug 5 13:04:00] info : Sleeping for 1600000 ms (src/control.c:159) > [EDT Aug 5 13:04:01] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:02] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:03] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:04] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:05] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:06] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:07] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:08] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:09] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:10] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:11] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:12] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:13] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:14] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:15] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:16] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:17] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:18] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:19] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:20] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:21] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:22] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:23] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:24] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:25] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:26] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:27] info : Sleeping for 1000000 ms (src/control.c:159) > [EDT Aug 5 13:04:28] error : 'service_name' failed to start (exit status > 1) -- /etc/init.d/service_name: Shutting down service_name: [ OK ] > Starting service_name: [ OK ]^M[FAILED] > > > 8 services at 30 seconds each = 240 seconds, this means the > sleep(Run.polltime) in monit.c only gets called every 4 minutes. This is with > the daemon interval set to 10 seconds. Notice ~240 seconds (4 minutes) > between each occurrence: > > # grep 'src/monit.c' /var/log/monit > [EDT Aug 5 12:56:33] info : Sleeping for 10 seconds (src/monit.c:561) > [EDT Aug 5 13:00:46] info : Sleeping for 10 seconds (src/monit.c:561) > [EDT Aug 5 13:05:00] info : Sleeping for 10 seconds (src/monit.c:561) > > So how can I control the execTimeout without having monit give up on trying > to start that service? > > Thanks, > > On Fri, Aug 5, 2016 at 11:43 AM, Geoff Goas <[email protected] > <mailto:[email protected]>> wrote: > Hello, > > Thanks for the suggestions. In RHEL/CentOS 5 and 6, the default config > is /etc/monit.conf. User configs are ~/.monit.conf. This is the only > change to the source that is being applied by the package maintainer. > > Package listing: > > # rpm -ql monit > /etc/logrotate.d/monit > /etc/monit.conf > /etc/monit.d > /etc/monit.d/logging > /etc/rc.d/init.d/monit > /usr/bin/monit > /usr/share/doc/monit-5.14 > /usr/share/doc/monit-5.14/COPYING > /usr/share/doc/monit-5.14/README > /usr/share/man/man1/monit.1.gz > /var/log/monit > > >From an strace of monit starting up: > > getcwd("/etc/monit.d", 4096) = 13 > stat("/root/.monit.conf", 0x7fff87cc7560) = -1 ENOENT (No such file or > directory) > stat("/etc/monit.conf", {st_mode=S_IFREG|0600, st_size=11346, ...}) = 0 > open("/etc/monit.conf", O_RDONLY) = 3 > open("/etc/monit.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 > > Showing that the set daemon directive is specified only once: > > # grep 'set daemon' /etc/monit.conf > #set daemon 30 # check services at 30 seconds intervals > > # grep 'set daemon' /etc/monit.d/* > /etc/monit.d/00base.conf:set daemon 50 > > Here is the monit log showing the 30 second interval even though it is > set to 50: > > # grep 'Aborting event' /var/log/monit | tail -n20 > [EDT Aug 4 17:05:06] error : Aborting event > [EDT Aug 4 17:05:36] error : Aborting event > [EDT Aug 4 17:06:06] error : Aborting event > [EDT Aug 4 17:06:37] error : Aborting event > [EDT Aug 4 17:06:37] error : Aborting event > [EDT Aug 4 17:07:07] error : Aborting event > [EDT Aug 4 17:07:07] error : Aborting event > [EDT Aug 4 22:18:13] error : Aborting event > [EDT Aug 4 22:18:43] error : Aborting event > [EDT Aug 4 22:19:13] error : Aborting event > [EDT Aug 4 22:19:44] error : Aborting event > [EDT Aug 4 22:20:14] error : Aborting event > [EDT Aug 4 22:20:44] error : Aborting event > [EDT Aug 4 22:21:15] error : Aborting event > [EDT Aug 4 22:21:15] error : Aborting event > [EDT Aug 4 22:21:45] error : Aborting event > [EDT Aug 4 22:21:45] error : Aborting event > [EDT Aug 5 11:19:40] error : Aborting event > [EDT Aug 5 11:20:10] error : Aborting event > [EDT Aug 5 11:23:53] error : Aborting event > > This behavior is occurring across multiple CentOS 6 hosts. All of the > CentOS 5 hosts running 4.11 and 5.2 with nearly identical > configurations ("alert...on restart" changed to "alert...on nonexist" > on the monit 5.x instances) do not have this issue. > > I'm open to more suggestions but I feel as though I will end up having > to get some more debug out of monit. > > Thanks, > > On Aug 5, 2016 9:16 AM, "Martin Pala" <[email protected] > <mailto:[email protected]>> wrote: > > > > Monit's default configuration file is /etc/monitrc ... the /etc/monit.conf > > is not used, unless it was added to the search path by 3rd party (for > > example package maintainer). > > > > There could be also ".monitrc" file in your home directory ... the default > > search sequence for monit configuration file: > > > > ~/.monitrc > > /etc/monitrc > > @SYSCONFDIR/monitrc > > /usr/local/etc/monitrc > > ./monitrc > > > > > > > > > > > On 05 Aug 2016, at 15:07, Geoff Goas <[email protected] > > > <mailto:[email protected]>> wrote: > > > > > > I am setting it only in /etc/monit.conf. It is not being set in any other > > > configuration within /etc/monit.d. > > > > > > On Aug 5, 2016 9:03 AM, "Martin Pala" <[email protected] > > > <mailto:[email protected]>> wrote: > > > Hello, > > > > > > you have most probably two configuration files - the one which you > > > changed is different from the file used by monit. > > > > > > Best regards, > > > Martin > > > > > > > > >> On 05 Aug 2016, at 04:42, Geoff Goas <[email protected] > > >> <mailto:[email protected]>> wrote: > > >> > > >> Hello, > > >> > > >> I'm having an issue with the CentOS 6 release of monit 5.14. I have set > > >> the daemon interval to 5, 10, and 50 seconds - monit was fully restarted > > >> for each adjustment of the interval - yet it still polls every 30 > > >> seconds as if the configured value is being ignored. I also attempting > > >> passing the interval using the -d switch to no avail. > > >> > > >> My testing consisted of having monit attempt to start a service that > > >> could never possibly start, and without any timeout set. The log shows a > > >> 30 second interval between service checks, and so does an strace of the > > >> monit process. > > >> > > >> I have monit 5.2 running on CentOS 5.2 with a nearly identical > > >> configuration. On that host, I have the daemon interval set to 10 > > >> seconds, and it is polling at that interval just fine. > > >> > > >> Do you have any recommendations on what to check next? > > >> > > >> Thanks, > > >> > > >> -- > > >> Geoff Goas > > >> Systems Engineer > > >> -- > > >> To unsubscribe: > > >> https://lists.nongnu.org/mailman/listinfo/monit-general > > >> <https://lists.nongnu.org/mailman/listinfo/monit-general> > > > > > > > > > -- > > > To unsubscribe: > > > https://lists.nongnu.org/mailman/listinfo/monit-general > > > <https://lists.nongnu.org/mailman/listinfo/monit-general> > > > -- > > > To unsubscribe: > > > https://lists.nongnu.org/mailman/listinfo/monit-general > > > <https://lists.nongnu.org/mailman/listinfo/monit-general> > > > > > > -- > > To unsubscribe: > > https://lists.nongnu.org/mailman/listinfo/monit-general > > <https://lists.nongnu.org/mailman/listinfo/monit-general> > > > > -- > Geoff Goas > Systems Engineer > > > > -- > Geoff Goas > Systems Engineer > > > > -- > Geoff Goas > Systems Engineer > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
