Yeah was looking through the code and saw the call to check if process is
running before issuing stop (ProcessTree_findProcess), so that was only
thought I had as well.
check process foo matching /usr/local/bin/foo.py
start program = "/bin/bash -l -c 'nohup /usr/local/bin/foo.py &'" as
uid "nobody"
stop program = "/usr/bin/pkill -u nobody -f /usr/local/bin/foo.py" as
uid "nobody"
if uptime > 11 hours then alert
if uptime > 12 hours then exec "/usr/bin/pkill -u nobody -f -9
/usr/local/bin/foo.py" as uid "nobody"
if 2 restarts within 3 cycles then timeout
group apps
depends foo.py
check process bar matching ^/usr/local/bin/bar
start program = "/bin/bash -lc 'HOME=/home/someuser nohup
/usr/local/bin/bar.sh > /tmp/bar-startup.out 2>&1 &'"
stop program = "/bin/bash -c '/usr/bin/pkill -f ^/usr/local/bin/bar;
sleep 1; /usr/bin/pkill -f ^/usr/local/bin/bar'"
onreboot nostart
if uptime > 12 hours then exec "/usr/bin/pkill -9 -f
^/usr/local/bin/bar"
group apps
mode passive
Here are logs from yesterday and today wrt to "bar"
[CST Mar 1 15:15:01] info : 'bar' stop action done
[CST Mar 4 07:02:01] info : 'bar' start on user request
[CST Mar 4 07:02:01] info : 'bar' start action done
[CST Mar 4 07:02:01] error : 'bar' uptime test failed for
/usr/local/bin/bar-- current uptime is 259177 seconds
<we get above since it failed to shutdown on 3/1>
[CST Mar 4 07:02:01] info : 'bar' exec: '/usr/bin/pkill -9 -f
/usr/local/bin/bar'
[CST Mar 4 07:02:21] error : 'bar' process is not running
<above line repeats every 20 seconds until we manually start it via monit>
[CST Mar 4 07:51:11] info : 'bar' start: '/bin/bash -lc
HOME=/home/someuser nohup /usr/local/bin/bar.sh > /tmp/bar-startup.out 2>&1
&'
[CST Mar 4 07:51:11] info : 'bar' start action done
[CST Mar 4 07:51:11] info : 'bar' process is running with pid 4897
[CST Mar 4 07:51:11] info : 'bar' uptime test succeeded [current
uptime = 1 seconds]
[CST Mar 4 15:15:01] info : 'bar' stop on user request
[CST Mar 4 15:15:01] info : 'bar' stop action done
<below same thing repeats itself the following morning>
[CST Mar 5 07:02:01] info : 'bar' start on user request
[CST Mar 5 07:02:01] info : 'bar' start action done
[CST Mar 5 07:02:01] error : 'bar' uptime test failed for
/usr/local/bin/bar-- current uptime is 83451 seconds
[CST Mar 5 07:02:01] info : 'bar' exec: '/usr/bin/pkill -9 -f
/usr/local/bin/bar'
Thanks again for looking. Worst case I'll just build a debug version of
monit with some extra logging to see what is going on.
On Tue, Mar 5, 2019 at 2:40 PM [email protected] <
[email protected]> wrote:
> Hi,
>
> please can you add the configuration of "foo" and "bar" services?
>
> There are for example these possible reasons:
>
> 1.) the "bar" service is a process and monit detected that the process is
> not running - in this case it gets a fast path and stop is skipped (the
> process is not running)
>
> 2.) there was a problem if you used "check program" in combination with
> the "every" statement ... fixed in monit 5.25.3:
> https://bitbucket.org/tildeslash/monit/issues/759
>
> Best regards,
> Martin
>
>
> On 5 Mar 2019, at 16:24, Marc Rossi <[email protected]> wrote:
>
> Looking through source right now but figured I'd throw it out to list to
> see if this is something obvious I'm doing wrong.
>
> Long time monit user but on a few of our apps we have recently been having
> problems with the shutdown action possibly not running.
>
> For the app that DOES shut down properly logs show the following:
>
> [CST Mar 4 17:00:02] info : 'foo' stop on user request
> [CST Mar 4 17:00:02] info : Monit daemon with PID 17733 awakened
> [CST Mar 4 17:00:02] info : Awakened by User defined signal 1
> [CST Mar 4 17:00:02] info : 'foo' stop: '/usr/bin/pkill -u nobody -f
> /usr/local/bin/foo.py'
> [CST Mar 4 17:00:02] info : 'foo' stop action done
>
> For the app that is not stopping properly logs show the following:
>
> [CST Mar 4 15:15:01] info : 'bar' stop on user request
> [CST Mar 4 15:15:01] info : Monit daemon with PID 17733 awakened
> [CST Mar 4 15:15:01] info : Awakened by User defined signal 1
> [CST Mar 4 15:15:01] info : 'bar' stop action done
>
> Could be a red herring but where is the stop action line in the second log
> excerpt? Now the shutdown commands are indeed different between foo & bar
> but still would expect to see the stop action listed.
>
> TIA
> Marc
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general