On 09/18/14 22:32, Ski Kacoroski wrote:

> 
> End result is that the response time issue is an artifact of the monitoring
> tool data collection methodology not a real problem.  Lesson learned is that
> when I see strangeness I need to really understand how I collected the data
> showing the strangeness.
> 

So, it was slow based on monitoring without empirical evidence.  We often do
the reverse to our users here.... but, this reminds me of a problem I found
recently.

Users complained a system had become slow (wasn't until it was mentioned it
seems really bad for a few minutes about every 10 minutes....which would've
sped up its discovery.)

We have a data collection policy that is part of every cfengine run on hosts.
Actually, one of several, where this one was being done at the start of the
sequence.  Basically, early enough to avoid detecting the impact of what the
agent was doing to the system.  Which on this machine was higher and last
longer than usual.

We do agent runs every 10 minutes.

When I looked, there seemed to be a lot of extra I/O related to its lock
database.  Around then, I stumbled upon the tcdb fix bundle pull request by
Nick Anderson.... and its history about corruption and bloating problems with
cf_lock.tcdb.

Now it seems to be back to its normal high CPU & I/O impact as on my other
systems....

Still struggling with how do locks work in CFEngine....though I think my
understanding was helped by reading a paper from LISA '97....

Probably time to put together request to see if I'll get to LISA this year...

-- 
Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator
For: Enterprise Server Technologies (EST) -- & SafeZone Ally
_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to