Strange, as on CentOS 5 it was enough to restart monit only. Anyhow, thanks for helping to finally pinpoint the issue. On Dec 21, 2015 4:46 PM, "Martin Pala" <[email protected]> wrote:
> The monitoring state persistency is part of monit for very long time - i > think most probably even monit <= 3.x worked like this. > > As mentioned in the previous email, you can remove the timeout statement - > if it won't be possible to recover the service and restart action will be > called each cycle, there will be no limit on number of restarts, but you > will be notified and can fix it manually (which is required in such case > anyway). > > Regards, > Martin > > > On 21 Dec 2015, at 12:09, Stas Oskin <[email protected]> wrote: > > Alternatively, if no such mode is possible, are there any issues to just > having the monit running in endless loop trying to recover the service? > > On Mon, Dec 21, 2015 at 1:08 PM, Stas Oskin <[email protected]> wrote: > >> Hi, >> >> Thanks for the clarification, it seems this was exactly the change >> between 4 and 5 that caused us the confusion. >> >> Is there a way to have a previous mode of operation, where the monit will >> reset the state by restarting the monit itself (and not the server, as per >> your suggestion)? >> >> Thanks. >> >> On Wed, Dec 16, 2015 at 6:11 PM, Martin Pala <[email protected]> >> wrote: >> >>> Hi, >>> >>> Monit will disable the process monitoring on excessive restart failures >>> due to "if 5 restart within 5 cycles then timeout" statement in your >>> configuration (the "timeout" action an alias for "unmonitor" and we >>> switched the documentation in the past to "unmonitor" as it is more clear: >>> https://mmonit.com/monit/documentation/monit.html#SERVICE-RESTART-LIMIT) >>> >>> The monitoring state is persistent - the "timed out" service has usually >>> some hard error which requires manual intervention (timeout statement >>> prevents endless restart loop). When the problem is resolved, the >>> monitoring needs to be enabled manually. >>> >>> If you want to drop the state for example after reboot, place the >>> statefile to tmpfs filesystem (you can use "set statefile <path>" statement >>> to customize state file placement). >>> >>> Regards, >>> Martin >>> >>> >>> >>> On 16 Dec 2015, at 14:20, Stas Oskin <[email protected]> wrote: >>> >>> Hi, >>> >>> After some more digging, it occurred to me that it might that monit just >>> stops monitoring the process after it unable to restart it. >>> >>> So on monit 4.x it appears this state was cleared when just restarting >>> monit, while on 5.x it seems you need actually to mark the check as active >>> manually via the monit command. >>> >>> Is this correct? >>> >>> On Fri, Nov 20, 2015 at 5:37 PM, Martin Pala <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> please run monit in debug mode and send output: >>>> >>>> monit -vI >>>> >>>> Regards, >>>> Martin >>>> >>>> >>>> >>>> On 19 Nov 2015, at 20:39, Stas Oskin <[email protected]> wrote: >>>> >>>> Hi, >>>> >>>> The monit log shows only the general start-up information. >>>> >>>> There is no messages about the processes going offline, it's like monit >>>> does not use the pid file to find the process anymore. >>>> >>>> When I use HTTP port probing though it works just fine. Any idea what >>>> could it be? >>>> >>>> Regards. >>>> >>>> On Sun, Nov 15, 2015 at 9:16 PM, Martin Pala <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> please can you provide more details about the problem? (error messages >>>>> and/or monit log). >>>>> >>>>> Note that monit 5.9 includes fix for program execution for >>>>> CentOS6/RHEL6, we recommend upgrade to latest monit version (5.15), you >>>>> can >>>>> get it here: https://mmonit.com/monit/#download. You can build rpm >>>>> directly from the source code release: rpmbuild -tb monit-5.15.tar.gz. I >>>>> think RHEL uses custom configuration file, official monit looks for >>>>> /etc/monitrc, so you may need to rename the configuration file or create a >>>>> link after upgrade. >>>>> >>>>> Regards, >>>>> Martin >>>>> >>>>> >>>>> > On 14 Nov 2015, at 16:51, Stas Oskin <[email protected]> wrote: >>>>> > >>>>> > Hi, >>>>> > >>>>> > Monit has reliably served us through the years, and we are very >>>>> happy of it. >>>>> > >>>>> > Unfortunately during scheduled migration to CentOS 6 due CentOS 5 >>>>> EOL, we discovered it stopped monitoring the services pid files. HTTP >>>>> monitoring works fine. >>>>> > >>>>> > The CentOS 6 version is: >>>>> > monit-5.1.1-4.el6.x86_64 >>>>> > >>>>> > CentOS 5 version is: >>>>> > monit-4.10.1-8.el5 >>>>> > >>>>> > An example config that not working anymore (but accepted by monit >>>>> when starting): >>>>> > check process XXXX with pidfile /XXXX/pid/XXXXX.pid >>>>> > start program "/etc/init.d/xxxxx restart" >>>>> > stop program "/etc/init.d/xxxxxx stop" >>>>> > if mem usage > 85% then restart >>>>> > if 5 restarts within 5 cycles then timeout >>>>> > >>>>> > I guess something changed in configuration jump from 4 to 5, will >>>>> appreciate any advice. >>>>> > >>>>> > Thanks! >>>>> > -- >>>>> > To unsubscribe: >>>>> > https://lists.nongnu.org/mailman/listinfo/monit-general >>>>> >>>>> >>>>> -- >>>>> To unsubscribe: >>>>> https://lists.nongnu.org/mailman/listinfo/monit-general >>>>> >>>> >>>> -- >>>> To unsubscribe: >>>> https://lists.nongnu.org/mailman/listinfo/monit-general >>>> >>>> >>>> >>>> -- >>>> To unsubscribe: >>>> https://lists.nongnu.org/mailman/listinfo/monit-general >>>> >>> >>> -- >>> To unsubscribe: >>> https://lists.nongnu.org/mailman/listinfo/monit-general >>> >>> >>> >>> -- >>> To unsubscribe: >>> https://lists.nongnu.org/mailman/listinfo/monit-general >>> >> >> > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general > > > > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general >
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
