Currently the default start/stop/restart program limit is hardcoded and can be 
overridden only locally using the "timeout" option as you noted.

Switching default timeout to cycle length won't work - many users have poll 
cycle like 2 minutes.

I think the best way will be to extend the "set limits" statement 
(https://mmonit.com/monit/documentation/monit.html#LIMITS) with new option to 
set start/stop/restart program timeouts ... you can create enhancement request 
via bitbucket: https://bitbucket.org/tildeslash/monit/issues/new

Best regards,
Martin



> On 09 Aug 2016, at 21:43, Geoff Goas <[email protected]> wrote:
> 
> Some more information: In monit 4.x, the start/stop program timeout defaulted 
> to 1 cycle, though this was not documented in the man page. Starting in 5.0, 
> it is no longer possible to specify this timeout in cycles, only seconds (per 
> the man page).
> 
> From the changelog:
> 
> * It is now possible to define execution timeout for start and
>   stop commands. That is, how long Monit will wait after
>   executing a command before it assume execution failed. If the
>   timeout option is omitted, Monit defaults to 30 seconds. You
>   can override the timeout for example for services which are
>   starting slower.
>   Example syntax:
>     start program = "/bin/foo start" with timeout 60 seconds
> 
> I tried to specify "1 cycle" for the start/stop program entries, and I get a 
> syntax error as expected.
> 
> Would the devs entertain the notion of adding the ability to specify cycles, 
> or the ability to set the timeout at the global level?
> 
> With the following patch I can make the default timeout to match up with the 
> daemon interval:
> 
> --- src/p.y.orig      2016-08-09 14:23:02.000000000 -0400
> +++ src/p.y   2016-08-09 14:23:48.000000000 -0400
> @@ -1550,7 +1550,7 @@
>                  ;
>  
>  exectimeout     : /* EMPTY */ {
> -                   $<number>$ = EXEC_TIMEOUT;
> +                   $<number>$ = Run.polltime;
>                    }
>                  | TIMEOUT NUMBER SECOND {
>                     $<number>$ = $2;
> --- src/y.tab.c.orig  2016-08-09 14:54:25.000000000 -0400
> +++ src/y.tab.c       2016-08-09 14:54:34.000000000 -0400
> @@ -4576,7 +4576,7 @@
>    case 418:
>  #line 1552 "src/p.y" /* yacc.c:1646  */
>      {
> -                   (yyval.number) = EXEC_TIMEOUT;
> +                   (yyval.number) = Run.polltime;
>                    }
>  #line 4581 "src/y.tab.c" /* yacc.c:1646  */
>      break;
> 
> 
> On Mon, Aug 8, 2016 at 3:26 PM, Geoff Goas <[email protected] 
> <mailto:[email protected]>> wrote:
> It looks like I can put "timeout X seconds" after the start program / stop 
> program lines to control this interval. Is there a way to set it globally?
> 
> On Fri, Aug 5, 2016 at 1:12 PM, Geoff Goas <[email protected] 
> <mailto:[email protected]>> wrote:
> I think I have found the issue, and I think I may have to walk back my 
> statements on this affecting a certain version or distro.
> 
> I have 8 monitored services in an "Execution failed" state. None of the 
> services have a timeout defined. 
> 
> The timeout apparently defaults to EXEC_TIMEOUT (30 seconds). monit waits the 
> full 30 seconds for the service check to finally fail before checking the 
> next service that is also in an "Execution failed" state. 
> 
> [EDT Aug  5 13:03:58] error    : 'service_name' process is not running
> [EDT Aug  5 13:03:58] info     : 'service_name' trying to restart
> [EDT Aug  5 13:03:58] info     : 'service_name' start: 
> /etc/init.d/service_name
> [EDT Aug  5 13:03:58] info     : Sleeping for 100 ms (src/control.c:127)
> [EDT Aug  5 13:03:58] info     : Sleeping for 100 ms (src/control.c:127)
> [EDT Aug  5 13:03:58] info     : Sleeping for 50000 ms (src/control.c:159)
> [EDT Aug  5 13:03:58] info     : Sleeping for 100000 ms (src/control.c:159)
> [EDT Aug  5 13:03:58] info     : Sleeping for 200000 ms (src/control.c:159)
> [EDT Aug  5 13:03:58] info     : Sleeping for 400000 ms (src/control.c:159)
> [EDT Aug  5 13:03:59] info     : Sleeping for 800000 ms (src/control.c:159)
> [EDT Aug  5 13:04:00] info     : Sleeping for 1600000 ms (src/control.c:159)
> [EDT Aug  5 13:04:01] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:02] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:03] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:04] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:05] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:06] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:07] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:08] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:09] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:10] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:11] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:12] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:13] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:14] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:15] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:16] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:17] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:18] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:19] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:20] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:21] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:22] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:23] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:24] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:25] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:26] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:27] info     : Sleeping for 1000000 ms (src/control.c:159)
> [EDT Aug  5 13:04:28] error    : 'service_name' failed to start (exit status 
> 1) -- /etc/init.d/service_name: Shutting down service_name: [  OK  ]
> Starting service_name: [  OK  ]^M[FAILED]
> 
> 
> 8 services at 30 seconds each = 240 seconds, this means the 
> sleep(Run.polltime) in monit.c only gets called every 4 minutes. This is with 
> the daemon interval set to 10 seconds. Notice ~240 seconds (4 minutes) 
> between each occurrence:
> 
> # grep 'src/monit.c' /var/log/monit
> [EDT Aug  5 12:56:33] info     : Sleeping for 10 seconds (src/monit.c:561)
> [EDT Aug  5 13:00:46] info     : Sleeping for 10 seconds (src/monit.c:561)
> [EDT Aug  5 13:05:00] info     : Sleeping for 10 seconds (src/monit.c:561)
> 
> So how can I control the execTimeout without having monit give up on trying 
> to start that service?
> 
> Thanks,
> 
> On Fri, Aug 5, 2016 at 11:43 AM, Geoff Goas <[email protected] 
> <mailto:[email protected]>> wrote:
> Hello,
> 
> Thanks for the suggestions. In RHEL/CentOS 5 and 6, the default config
> is /etc/monit.conf. User configs are ~/.monit.conf. This is the only
> change to the source that is being applied by the package maintainer.
> 
> Package listing:
> 
> # rpm -ql monit
> /etc/logrotate.d/monit
> /etc/monit.conf
> /etc/monit.d
> /etc/monit.d/logging
> /etc/rc.d/init.d/monit
> /usr/bin/monit
> /usr/share/doc/monit-5.14
> /usr/share/doc/monit-5.14/COPYING
> /usr/share/doc/monit-5.14/README
> /usr/share/man/man1/monit.1.gz
> /var/log/monit
> 
> >From an strace of monit starting up:
> 
> getcwd("/etc/monit.d", 4096)            = 13
> stat("/root/.monit.conf", 0x7fff87cc7560) = -1 ENOENT (No such file or
> directory)
> stat("/etc/monit.conf", {st_mode=S_IFREG|0600, st_size=11346, ...}) = 0
> open("/etc/monit.conf", O_RDONLY)       = 3
> open("/etc/monit.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
> 
> Showing that the set daemon directive is specified only once:
> 
> # grep 'set daemon' /etc/monit.conf
> #set daemon  30              # check services at 30 seconds intervals
> 
> # grep 'set daemon' /etc/monit.d/*
> /etc/monit.d/00base.conf:set daemon 50
> 
> Here is the monit log showing the 30 second interval even though it is
> set to 50:
> 
> # grep 'Aborting event' /var/log/monit  | tail -n20
> [EDT Aug  4 17:05:06] error    : Aborting event
> [EDT Aug  4 17:05:36] error    : Aborting event
> [EDT Aug  4 17:06:06] error    : Aborting event
> [EDT Aug  4 17:06:37] error    : Aborting event
> [EDT Aug  4 17:06:37] error    : Aborting event
> [EDT Aug  4 17:07:07] error    : Aborting event
> [EDT Aug  4 17:07:07] error    : Aborting event
> [EDT Aug  4 22:18:13] error    : Aborting event
> [EDT Aug  4 22:18:43] error    : Aborting event
> [EDT Aug  4 22:19:13] error    : Aborting event
> [EDT Aug  4 22:19:44] error    : Aborting event
> [EDT Aug  4 22:20:14] error    : Aborting event
> [EDT Aug  4 22:20:44] error    : Aborting event
> [EDT Aug  4 22:21:15] error    : Aborting event
> [EDT Aug  4 22:21:15] error    : Aborting event
> [EDT Aug  4 22:21:45] error    : Aborting event
> [EDT Aug  4 22:21:45] error    : Aborting event
> [EDT Aug  5 11:19:40] error    : Aborting event
> [EDT Aug  5 11:20:10] error    : Aborting event
> [EDT Aug  5 11:23:53] error    : Aborting event
> 
> This behavior is occurring across multiple CentOS 6 hosts. All of the
> CentOS 5 hosts running 4.11 and 5.2 with nearly identical
> configurations ("alert...on restart" changed to "alert...on nonexist"
> on the monit 5.x instances) do not have this issue.
> 
> I'm open to more suggestions but I feel as though I will end up having
> to get some more debug out of monit.
> 
> Thanks,
> 
> On Aug 5, 2016 9:16 AM, "Martin Pala" <[email protected] 
> <mailto:[email protected]>> wrote:
> >
> > Monit's default configuration file is /etc/monitrc ... the /etc/monit.conf 
> > is not used, unless it was added to the search path by 3rd party (for 
> > example package maintainer).
> >
> > There could be also ".monitrc" file in your home directory ... the default 
> > search sequence for monit configuration file:
> >
> >         ~/.monitrc
> >         /etc/monitrc
> >         @SYSCONFDIR/monitrc
> >         /usr/local/etc/monitrc
> >         ./monitrc
> >
> >
> >
> >
> > > On 05 Aug 2016, at 15:07, Geoff Goas <[email protected] 
> > > <mailto:[email protected]>> wrote:
> > >
> > > I am setting it only in /etc/monit.conf. It is not being set in any other 
> > > configuration within /etc/monit.d.
> > >
> > > On Aug 5, 2016 9:03 AM, "Martin Pala" <[email protected] 
> > > <mailto:[email protected]>> wrote:
> > > Hello,
> > >
> > > you have most probably two configuration files - the one which you 
> > > changed is different from the file used by monit.
> > >
> > > Best regards,
> > > Martin
> > >
> > >
> > >> On 05 Aug 2016, at 04:42, Geoff Goas <[email protected] 
> > >> <mailto:[email protected]>> wrote:
> > >>
> > >> Hello,
> > >>
> > >> I'm having an issue with the CentOS 6 release of monit 5.14. I have set 
> > >> the daemon interval to 5, 10, and 50 seconds - monit was fully restarted 
> > >> for each adjustment of the interval - yet it still polls every 30 
> > >> seconds as if the configured value is being ignored. I also attempting 
> > >> passing the interval using the -d switch to no avail.
> > >>
> > >> My testing consisted of having monit attempt to start a service that 
> > >> could never possibly start, and without any timeout set. The log shows a 
> > >> 30 second interval between service checks, and so does an strace of the 
> > >> monit process.
> > >>
> > >> I have monit 5.2 running on CentOS 5.2 with a nearly identical 
> > >> configuration. On that host, I have the daemon interval set to 10 
> > >> seconds, and it is polling at that interval just fine.
> > >>
> > >> Do you have any recommendations on what to check next?
> > >>
> > >> Thanks,
> > >>
> > >> --
> > >> Geoff Goas
> > >> Systems Engineer
> > >> --
> > >> To unsubscribe:
> > >> https://lists.nongnu.org/mailman/listinfo/monit-general 
> > >> <https://lists.nongnu.org/mailman/listinfo/monit-general>
> > >
> > >
> > > --
> > > To unsubscribe:
> > > https://lists.nongnu.org/mailman/listinfo/monit-general 
> > > <https://lists.nongnu.org/mailman/listinfo/monit-general>
> > > --
> > > To unsubscribe:
> > > https://lists.nongnu.org/mailman/listinfo/monit-general 
> > > <https://lists.nongnu.org/mailman/listinfo/monit-general>
> >
> >
> > --
> > To unsubscribe:
> > https://lists.nongnu.org/mailman/listinfo/monit-general 
> > <https://lists.nongnu.org/mailman/listinfo/monit-general>
> 
> 
> 
> -- 
> Geoff Goas
> Systems Engineer
> 
> 
> 
> -- 
> Geoff Goas
> Systems Engineer
> 
> 
> 
> -- 
> Geoff Goas
> Systems Engineer
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to