-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Follow-up: What I found is cfexecd is spawning cfagents every 5 minutes during the scheduled quarter hour. So in Q1 it spawns one at 0,5,10 and in Q3 it spawns one at 30,35,40. Therefore I get and increased load by a factor of 3 on the server rather than reducing the load as I was trying to do.
_____________________________________________________________________ Darryl Baker gedas USA, Inc. Operational Services Business Unit 3800 Hamlin Road Auburn Hills, MI 48326 US phone +1-248-754-5341 fax +1-248-754-6399 [EMAIL PROTECTED] http://www.gedasusa.com _____________________________________________________________________ > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] > Behalf Of Baker, Darryl > Sent: Tuesday, March 15, 2005 12:31 PM > To: help-cfengine@gnu.org > Subject: Too many cfagents running. Was: Load problem with cfservd > > > > > *** PGP Signature Status: good > *** Signer: Darryl Philip Baker <[EMAIL PROTECTED]> > *** Signed: 3/15/2005 12:31:14 PM > *** Verified: 3/16/2005 9:10:01 AM > *** BEGIN PGP VERIFIED MESSAGE *** > > > *** PGP Signature Status: good > *** Signer: Darryl Philip Baker <[EMAIL PROTECTED]> > *** Signed: 3/15/2005 12:28:32 PM > *** Verified: 3/15/2005 12:30:08 PM > *** BEGIN PGP VERIFIED MESSAGE *** > > Installing the latest snapshot has reduced the problem with system > loading on the master. > > Now I'm finding that cfexecd is starting one cfagent every 5 > minutes even though I have the schedule set to only run in Q1 and > Q4."schedule = ( Q1 Q3 )" Why? > > > > ____________________________________________________________________ > _ Darryl Baker > gedas USA, Inc. > Operational Services Business Unit > 3800 Hamlin Road > Auburn Hills, MI 48326 > US > phone +1-248-754-5341 > fax +1-248-754-6399 > [EMAIL PROTECTED] > http://www.gedasusa.com > ____________________________________________________________________ > _ > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] > > Behalf Of Baker, Darryl > > Sent: Monday, March 14, 2005 4:08 PM > > To: help-cfengine@gnu.org > > Subject: Load problem with cfservd > > > > > > > > *** PGP Signature Status: good > > *** Signer: Darryl Philip Baker <[EMAIL PROTECTED]> > > *** Signed: 3/14/2005 4:08:02 PM > > *** Verified: 3/15/2005 10:54:48 AM > > *** BEGIN PGP VERIFIED MESSAGE *** > > > > My master machine is Solaris 9 and all systems are running > > Solaris 8 or 9 and cfengine 2.1.13. > > > > The problem we have with cfservd manifests itself as a periodic > > clog that takes about a minute to resolve. This period is > > characterized by the following symptoms: > > > > 1. Load average spike from ~3 (on a 4-processor system) to the > > 6-8 range. Occasionally the spike breaks into double digits. > > 2. Increase in concurrent port 5308 (cfengine) sessions from a > > base level of 0-4 to peaks in the 12-30 range, with the number of > > LWP's in the cfservd processes tracking the number of connections > > linearly. (Client systems are set to connect twice an hour with a > > 25-minute > > 'splay time.) > > 3. Running lockstat shows severe contention for a single adaptive > > mutex: > > > > [EMAIL PROTECTED]:proc# lockstat sleep 5 > > > > Adaptive mutex spin: 157416 events in 5.040 seconds (31233 > > events/sec) > > Count indv cuml rcnt spin Lock Caller > > > > > > > > ------------------------------------------------------------------ > > -- -- --------- > > 136805 87% 87% 1.00 75 0x152ec90 > > sfmmu_mlist_enter+0x84 > > [...] > > Adaptive mutex block: 648 events in 5.040 seconds (129 > > events/sec) Count indv cuml rcnt nsec Lock > > Caller > > > > > > ------------------------------------------------------------------ > > -- -- --------- > > 547 84% 84% 1.00 391652 0x152ec90 > > sfmmu_mlist_enter+0x84 > > > > Both of those types of lock run about 2 orders of magnitude lower > > in total, with the specific lock running as much as 3 orders of > > magnitude lower, (i.e. ~100 spins and no blocks) when the system > > is in its 'calm' state. > > > > 4. The cfservd process becomes by far the top cpu user, eating > > 10-25% of total cpu on a 4-processor system. > > 5. The system retains some idle time (5-30%) but the time used by > > the kernel jumps to the 40-70% range. > > > > The history of troubleshooting this leads me to believe that the > > heavy ssh usage on this host is a significant compounding factor, > > i.e. that we are hitting some common bottleneck when we have > > cfservd accepting connections and are spawning batches of 30-100 > > outbound ssh connections at once. Reducing the herds of outbound > > ssh's has reduced the frequency and severity of these clog > > periods, but every time we change much of anything on the system, > > we end up getting back to a state where these clogs become > > common. > > > > > > > > __________________________________________________________________ > > __ _ Darryl Baker > > gedas USA, Inc. > > Operational Services Business Unit > > 3800 Hamlin Road > > Auburn Hills, MI 48326 > > US > > phone +1-248-754-5341 > > fax +1-248-754-6399 > > [EMAIL PROTECTED] > > http://www.gedasusa.com > > __________________________________________________________________ > > __ _ > > > > > > > > > > *** END PGP VERIFIED MESSAGE *** > > > > > > > > > *** END PGP VERIFIED MESSAGE *** > > > > *** END PGP VERIFIED MESSAGE *** > > > -----BEGIN PGP SIGNATURE----- Version: PGP Personal Security 7.0.3 iQA/AwUBQjg/SFe1Bhkj9lZeEQLLDwCfQoESiAjjH1RvS/SwZjGX98sRrXYAn1pQ 6FsWz4K8yitp5+l/Pi8JuPl1 =QylC -----END PGP SIGNATURE-----
Baker, Darryl.vcf
Description: Binary data
_______________________________________________ Help-cfengine mailing list Help-cfengine@gnu.org http://lists.gnu.org/mailman/listinfo/help-cfengine