Re: daemon poll interval - monit 5.14 on CentOS 6

Geoff Goas Mon, 08 Aug 2016 12:27:17 -0700

It looks like I can put "timeout X seconds" after the start program / stop
program lines to control this interval. Is there a way to set it globally?


On Fri, Aug 5, 2016 at 1:12 PM, Geoff Goas <[email protected]> wrote:

> I think I have found the issue, and I think I may have to walk back my
> statements on this affecting a certain version or distro.
>
> I have 8 monitored services in an "Execution failed" state. None of the
> services have a timeout defined.
>
> The timeout apparently defaults to EXEC_TIMEOUT (30 seconds). monit waits
> the full 30 seconds for the service check to finally fail before checking
> the next service that is also in an "Execution failed" state.
>
> [EDT Aug  5 13:03:58] error    : 'service_name' process is not running
> [EDT Aug  5 13:03:58] info     : 'service_name' trying to restart
> [EDT Aug  5 13:03:58] info     : 'service_name' start: /etc/init.d/
> service_name
> [EDT Aug  5 13:03:58] info     : Sleeping for 100 ms (src/control.c:127)
> [EDT Aug  5 13:03:58] info     : Sleeping for 100 ms (src/control.c:127)
> [EDT Aug  5 13:03:58] info     : Sleeping for 50000 ms (src/control.c:159)
> [EDT Aug  5 13:03:58] info     : Sleeping for 100000 ms (src/control.c:159)
> [EDT Aug  5 13:03:58] info     : Sleeping for 200000 ms (src/control.c:159)
> [EDT Aug  5 13:03:58] info     : Sleeping for 400000 ms (src/control.c:159)
> [EDT Aug  5 13:03:59] info     : Sleeping for 800000 ms (src/control.c:159)
> [EDT Aug  5 13:04:00] info     : Sleeping for 1600000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:01] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:02] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:03] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:04] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:05] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:06] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:07] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:08] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:09] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:10] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:11] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:12] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:13] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:14] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:15] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:16] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:17] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:18] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:19] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:20] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:21] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:22] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:23] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:24] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:25] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:26] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:27] info     : Sleeping for 1000000 ms
> (src/control.c:159)
> [EDT Aug  5 13:04:28] error    : 'service_name' failed to start (exit
> status 1) -- /etc/init.d/service_name: Shutting down service_name: [  OK
>  ]
> Starting service_name: [  OK  ]^M[FAILED]
>
>
> 8 services at 30 seconds each = 240 seconds, this means the
> sleep(Run.polltime) in monit.c only gets called every 4 minutes. This is
> with the daemon interval set to 10 seconds. Notice ~240 seconds (4 minutes)
> between each occurrence:
>
> # grep 'src/monit.c' /var/log/monit
> [EDT Aug  5 12:56:33] info     : Sleeping for 10 seconds (src/monit.c:561)
> [EDT Aug  5 13:00:46] info     : Sleeping for 10 seconds (src/monit.c:561)
> [EDT Aug  5 13:05:00] info     : Sleeping for 10 seconds (src/monit.c:561)
>
> So how can I control the execTimeout without having monit give up on
> trying to start that service?
>
> Thanks,
>
> On Fri, Aug 5, 2016 at 11:43 AM, Geoff Goas <[email protected]> wrote:
>
>> Hello,
>>
>> Thanks for the suggestions. In RHEL/CentOS 5 and 6, the default config
>> is /etc/monit.conf. User configs are ~/.monit.conf. This is the only
>> change to the source that is being applied by the package maintainer.
>>
>> Package listing:
>>
>> # rpm -ql monit
>> /etc/logrotate.d/monit
>> /etc/monit.conf
>> /etc/monit.d
>> /etc/monit.d/logging
>> /etc/rc.d/init.d/monit
>> /usr/bin/monit
>> /usr/share/doc/monit-5.14
>> /usr/share/doc/monit-5.14/COPYING
>> /usr/share/doc/monit-5.14/README
>> /usr/share/man/man1/monit.1.gz
>> /var/log/monit
>>
>> From an strace of monit starting up:
>>
>> getcwd("/etc/monit.d", 4096)            = 13
>> stat("/root/.monit.conf", 0x7fff87cc7560) = -1 ENOENT (No such file or
>> directory)
>> stat("/etc/monit.conf", {st_mode=S_IFREG|0600, st_size=11346, ...}) = 0
>> open("/etc/monit.conf", O_RDONLY)       = 3
>> open("/etc/monit.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
>>
>> Showing that the set daemon directive is specified only once:
>>
>> # grep 'set daemon' /etc/monit.conf
>> #set daemon  30              # check services at 30 seconds intervals
>>
>> # grep 'set daemon' /etc/monit.d/*
>> /etc/monit.d/00base.conf:set daemon 50
>>
>> Here is the monit log showing the 30 second interval even though it is
>> set to 50:
>>
>> # grep 'Aborting event' /var/log/monit  | tail -n20
>> [EDT Aug  4 17:05:06] error    : Aborting event
>> [EDT Aug  4 17:05:36] error    : Aborting event
>> [EDT Aug  4 17:06:06] error    : Aborting event
>> [EDT Aug  4 17:06:37] error    : Aborting event
>> [EDT Aug  4 17:06:37] error    : Aborting event
>> [EDT Aug  4 17:07:07] error    : Aborting event
>> [EDT Aug  4 17:07:07] error    : Aborting event
>> [EDT Aug  4 22:18:13] error    : Aborting event
>> [EDT Aug  4 22:18:43] error    : Aborting event
>> [EDT Aug  4 22:19:13] error    : Aborting event
>> [EDT Aug  4 22:19:44] error    : Aborting event
>> [EDT Aug  4 22:20:14] error    : Aborting event
>> [EDT Aug  4 22:20:44] error    : Aborting event
>> [EDT Aug  4 22:21:15] error    : Aborting event
>> [EDT Aug  4 22:21:15] error    : Aborting event
>> [EDT Aug  4 22:21:45] error    : Aborting event
>> [EDT Aug  4 22:21:45] error    : Aborting event
>> [EDT Aug  5 11:19:40] error    : Aborting event
>> [EDT Aug  5 11:20:10] error    : Aborting event
>> [EDT Aug  5 11:23:53] error    : Aborting event
>>
>> This behavior is occurring across multiple CentOS 6 hosts. All of the
>> CentOS 5 hosts running 4.11 and 5.2 with nearly identical
>> configurations ("alert...on restart" changed to "alert...on nonexist"
>> on the monit 5.x instances) do not have this issue.
>>
>> I'm open to more suggestions but I feel as though I will end up having
>> to get some more debug out of monit.
>>
>> Thanks,
>>
>> On Aug 5, 2016 9:16 AM, "Martin Pala" <[email protected]> wrote:
>> >
>> > Monit's default configuration file is /etc/monitrc ... the
>> /etc/monit.conf is not used, unless it was added to the search path by 3rd
>> party (for example package maintainer).
>> >
>> > There could be also ".monitrc" file in your home directory ... the
>> default search sequence for monit configuration file:
>> >
>> >         ~/.monitrc
>> >         /etc/monitrc
>> >         @SYSCONFDIR/monitrc
>> >         /usr/local/etc/monitrc
>> >         ./monitrc
>> >
>> >
>> >
>> >
>> > > On 05 Aug 2016, at 15:07, Geoff Goas <[email protected]> wrote:
>> > >
>> > > I am setting it only in /etc/monit.conf. It is not being set in any
>> other configuration within /etc/monit.d.
>> > >
>> > > On Aug 5, 2016 9:03 AM, "Martin Pala" <[email protected]> wrote:
>> > > Hello,
>> > >
>> > > you have most probably two configuration files - the one which you
>> changed is different from the file used by monit.
>> > >
>> > > Best regards,
>> > > Martin
>> > >
>> > >
>> > >> On 05 Aug 2016, at 04:42, Geoff Goas <[email protected]> wrote:
>> > >>
>> > >> Hello,
>> > >>
>> > >> I'm having an issue with the CentOS 6 release of monit 5.14. I have
>> set the daemon interval to 5, 10, and 50 seconds - monit was fully
>> restarted for each adjustment of the interval - yet it still polls every 30
>> seconds as if the configured value is being ignored. I also attempting
>> passing the interval using the -d switch to no avail.
>> > >>
>> > >> My testing consisted of having monit attempt to start a service that
>> could never possibly start, and without any timeout set. The log shows a 30
>> second interval between service checks, and so does an strace of the monit
>> process.
>> > >>
>> > >> I have monit 5.2 running on CentOS 5.2 with a nearly identical
>> configuration. On that host, I have the daemon interval set to 10 seconds,
>> and it is polling at that interval just fine.
>> > >>
>> > >> Do you have any recommendations on what to check next?
>> > >>
>> > >> Thanks,
>> > >>
>> > >> --
>> > >> Geoff Goas
>> > >> Systems Engineer
>> > >> --
>> > >> To unsubscribe:
>> > >> https://lists.nongnu.org/mailman/listinfo/monit-general
>> > >
>> > >
>> > > --
>> > > To unsubscribe:
>> > > https://lists.nongnu.org/mailman/listinfo/monit-general
>> > > --
>> > > To unsubscribe:
>> > > https://lists.nongnu.org/mailman/listinfo/monit-general
>> >
>> >
>> > --
>> > To unsubscribe:
>> > https://lists.nongnu.org/mailman/listinfo/monit-general
>>
>
>
>
> --
>
> *Geoff GoasSystems Engineer*
>



-- 

*Geoff GoasSystems Engineer*

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Re: daemon poll interval - monit 5.14 on CentOS 6

Reply via email to