If your batch runs regularly or consistently drive some virtual machines to 100% this may not signal a loop condition (which, I would guess, is why the ticket is being raised). Techs may grow conditioned to this and either take longer to respond or just outright 'ignore' the tickets eventually, since the 'normal' course of action is to page for a condition that is unresolvable without a larger share, or redistribution of the load.
If only the monitor could 'know' that the machine was running this batch load at a certain time of day and had an absolute share and was running 100% for an extended period of time. It could be set up to not sent out alerts based on all of these criteria. Wow! That would be a very nice feature. When your monitoring department looks at top, vmstat and sar to detect problems, don't forget the kernel numbers lie. Even the new steal timer is a little off. On 08/19/2010 05:51 PM, Berry van Sleeuwen wrote:
True, it isn't. It's the replacement of an operator. The main issue here is that it needs to raise tickets and get reporting stats. For instance, raise a ticket at 100% CPU (and indeed, our ABS limithard machines do raise tickets when they are running their batch..<sigh>.) or when a filesystem is at 100%. The reporting is for instance on CPU and filesystem usage. But indeed it can't provide insight in the performance of a guest, other than detect thresholds. And it doesn't have to either, the monitoring department can look at top, vmstat or sar to detect that kind of problems should they need to (yeah right, then they know all about the entire environment). Still, as for a case, this is a good point. We need to be able to address performance related monitoring and nagios can't do that. Or at least not within the scope of an entire LPAR. Thanks, Berry.
-- Rich Smrcina Phone: 414-491-6001 http://www.linkedin.com/in/richsmrcina Catch the WAVV! http://www.wavv.org WAVV 2011 - April 15-19, 2011 Colorado Springs, CO ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/
