intermittent user process tracking with monit

Sean Penticoff Tue, 17 Sep 2013 02:27:05 -0700

Hi,

Let me take a moment and try and describe what it is I'm trying to do incase my tack is all wrong.We have several systems that process data for users. The programs theusers run all run from a shared space and run in user space at the usersdiscretion. I would like to use monit to alert when one of theseprocesses is started and have it track the memory and cpu usage, furtheralerting on a condition where cpu or mem of that process exceeds acertain threshold (and possibly renicing it via some script)

I've currently set up alerts like this:
check process process1
    matching "process1"
    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert
check process process2
    matching "process2"
    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert
check process process3
    matching "process3"
    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert



...and it goes on for another dozen or so processes

This "works" but is not ideal
what would be ideal is more along the lines of
check process process1
    matching "process1"

alert on statechange (basically ignore the fact this process isnot running but let me know when it starts and ends [i.e alert on statea change] and monitor it when it is running)

    mode passive
    group processing
    if cpu is greater than 90% for 5 cycles then alert
    if memory is greater than 90% for 5 cycles then alert

Also we are using m/monit and every process on every machine that is NOTrunning shows up as a hit against overall health

i.e.
under the host status:
Status  10 out of 27 services are available

and on the main dashboard:

×[Sep 16 2013 15:59:47] Host*myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656>*reported aproblem with*process1***:process is not running×[Sep 16 2013 15:59:44] Host*myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656>*reported aproblem with*process2*:process is not running×[Sep 16 2013 15:59:40] Host*myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656>*reported aproblem with*process3*:process is not running×[Sep 16 2013 15:59:35] Host*myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656>*reported aproblem with*process4*:process is not running

multiplied by 20+ hosts
you get the idea.

The fact that the process isn't running is never a problem and I wouldlike to reflect that somehow and also be able to have some insight intowhats running where.

Another thing I would really like to be able to do is pass args in thealert emails


i.e. when the command process1 -t foo -o bar -cfg process1.cfg -v -X -s

is run I'd be tickled if I could get "-t foo -o bar -cfg process1.cfg-v -X -s" (or even the entire content of monit procmatch) into thealert somehow

I've only had this up and running for about a month and monit has savedmy bacon on filesystem checks and dead services several times. Justwanting to do a bit more than the system side of things with it.

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

intermittent user process tracking with monit

Reply via email to