On 09/18/14 22:32, Ski Kacoroski wrote:
> > End result is that the response time issue is an artifact of the monitoring > tool data collection methodology not a real problem. Lesson learned is that > when I see strangeness I need to really understand how I collected the data > showing the strangeness. > So, it was slow based on monitoring without empirical evidence. We often do the reverse to our users here.... but, this reminds me of a problem I found recently. Users complained a system had become slow (wasn't until it was mentioned it seems really bad for a few minutes about every 10 minutes....which would've sped up its discovery.) We have a data collection policy that is part of every cfengine run on hosts. Actually, one of several, where this one was being done at the start of the sequence. Basically, early enough to avoid detecting the impact of what the agent was doing to the system. Which on this machine was higher and last longer than usual. We do agent runs every 10 minutes. When I looked, there seemed to be a lot of extra I/O related to its lock database. Around then, I stumbled upon the tcdb fix bundle pull request by Nick Anderson.... and its history about corruption and bloating problems with cf_lock.tcdb. Now it seems to be back to its normal high CPU & I/O impact as on my other systems.... Still struggling with how do locks work in CFEngine....though I think my understanding was helped by reading a paper from LISA '97.... Probably time to put together request to see if I'll get to LISA this year... -- Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator For: Enterprise Server Technologies (EST) -- & SafeZone Ally _______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
