check program FOO with path BAR problem solved
On Tue, Sep 17, 2013 at 5:22 AM, Sean Penticoff <[email protected]>wrote: > Hi, > Let me take a moment and try and describe what it is I'm trying to do in > case my tack is all wrong. > We have several systems that process data for users. The programs the > users run all run from a shared space and run in user space at the users > discretion. I would like to use monit to alert when one of these processes > is started and have it track the memory and cpu usage, further alerting on > a condition where cpu or mem of that process exceeds a certain threshold > (and possibly renicing it via some script) > I've currently set up alerts like this: > check process process1 > matching "process1" > mode passive > group processing > if cpu is greater than 90% for 5 cycles then alert > if memory is greater than 90% for 5 cycles then alert > check process process2 > matching "process2" > mode passive > group processing > if cpu is greater than 90% for 5 cycles then alert > if memory is greater than 90% for 5 cycles then alert > check process process3 > matching "process3" > mode passive > group processing > if cpu is greater than 90% for 5 cycles then alert > if memory is greater than 90% for 5 cycles then alert > > > ...and it goes on for another dozen or so processes > > This "works" but is not ideal > what would be ideal is more along the lines of > check process process1 > matching "process1" > alert on statechange (basically ignore the fact this process is not > running but let me know when it starts and ends [i.e alert on state a > change] and monitor it when it is running) > mode passive > group processing > if cpu is greater than 90% for 5 cycles then alert > if memory is greater than 90% for 5 cycles then alert > > Also we are using m/monit and every process on every machine that is NOT > running shows up as a hit against overall health > i.e. > under the host status: > Status 10 out of 27 services are available > > and on the main dashboard: > > ×[Sep 16 2013 15:59:47] Host > *myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656> > * reported a problem with *process1***: process is not running > ×[Sep 16 2013 15:59:44] Host > *myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656> > * reported a problem with *process2*: process is not running > ×[Sep 16 2013 15:59:40] Host > *myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656> > * reported a problem with *process3*: process is not running > ×[Sep 16 2013 15:59:35] Host > *myhost.example.com<https://im-on-it.crbs.ucsd.edu/status/hosts/detail?id=1656> > * reported a problem with *process4*: process is not running > multiplied by 20+ hosts > you get the idea. > > The fact that the process isn't running is never a problem and I would > like to reflect that somehow and also be able to have some insight into > whats running where. > > Another thing I would really like to be able to do is pass args in the > alert emails > > i.e. when the command process1 -t foo -o bar -cfg process1.cfg -v -X -s > is run I'd be tickled if I could get "-t foo -o bar -cfg process1.cfg -v > -X -s" (or even the entire content of monit procmatch) into the alert > somehow > > I've only had this up and running for about a month and monit has saved my > bacon on filesystem checks and dead services several times. Just wanting to > do a bit more than the system side of things with it. > > > -- > To unsubscribe: > https://lists.nongnu.org/mailman/listinfo/monit-general > -- --------------------------------------------------------------------------------------------------------------------- () ascii ribbon campaign - against html e-mail /\
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
