Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

Marcy Cortes Thu, 19 Aug 2010 21:10:49 -0700

It'd be even cooler if your monitor could learn a virtual machines "normal" or 
"expected" activity pattern by time of day / day of week and the signal things 
out of the ordinary.  Like the batch activity that was supposed to have been 
running but took an unexpected low address protection exception and cpu dived 
to .5% or the online server whose new code release put them into an occasional 
loop and chewed an engine for a while.  (real world examples from oh the last 3 
weeks :).

The business of triggering on error messages is always a reactive thing.  You 
get a message, you have a big problem because bad messsage went unnoticed for 
hours and something on down the line failed, people play cleanup.  You add 
paging automation around that message for the next time... 

All of this systems automation software could be a lot smarter... 

Marcy 

-----Original Message-----
From: Linux on 390 Port [mailto:[email protected]] On Behalf Of Rich 
Smrcina
Sent: Thursday, August 19, 2010 4:39 PM
To: [email protected]
Subject: Re: [LINUX-390] How to convince others. Was: Re: mono keep guest 
active - ban the blips.

  If your batch runs regularly or consistently drive some virtual machines to 
100% this
may not signal a loop condition (which, I would guess, is why the ticket is 
being
raised).  Techs may grow conditioned to this and either take longer to respond 
or just
outright 'ignore' the tickets eventually, since the 'normal' course of action 
is to page
for a condition that is unresolvable without a larger share, or redistribution 
of the load.

If only the monitor could 'know' that the machine was running this batch load 
at a
certain time of day and had an absolute share and was running 100% for an 
extended
period of time.  It could be set up to not sent out alerts based on all of these
criteria.  Wow!  That would be a very nice feature.

When your monitoring department looks at top, vmstat and sar to detect 
problems, don't
forget the kernel numbers lie.  Even the new steal timer is a little off.

On 08/19/2010 05:51 PM, Berry van Sleeuwen wrote:
> True, it isn't. It's the replacement of an operator. The main issue here
> is that it needs to raise tickets and get reporting stats. For instance,
> raise a ticket at 100% CPU (and indeed, our ABS limithard machines do
> raise tickets when they are running their batch..<sigh>.) or when a
> filesystem is at 100%. The reporting is for instance on CPU and
> filesystem usage.
>
> But indeed it can't provide insight in the performance of a guest, other
> than detect thresholds. And it doesn't have to either, the monitoring
> department can look at top, vmstat or sar to detect that kind of
> problems should they need to (yeah right, then they know all about the
> entire environment).
>
> Still, as for a case, this is a good point. We need to be able to
> address performance related monitoring and nagios can't do that. Or at
> least not within the scope of an entire LPAR.
>
> Thanks, Berry.
>

--
Rich Smrcina
Phone: 414-491-6001
http://www.linkedin.com/in/richsmrcina

Catch the WAVV! http://www.wavv.org
WAVV 2011 - April 15-19, 2011 Colorado Springs, CO

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Re: How to convince others. Was: Re: mono keep guest active - ban the blips.

Reply via email to