Can you try the latest snapshot. M
Quoting "Baker, Darryl" <[EMAIL PROTECTED]>: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > My master machine is Solaris 9 and all systems are running Solaris 8 > or 9 and cfengine 2.1.13. > > The problem we have with cfservd manifests itself as a periodic clog > that takes about a minute to resolve. This period is characterized by > the following symptoms: > > 1. Load average spike from ~3 (on a 4-processor system) to the 6-8 > range. Occasionally the spike breaks into double digits. > 2. Increase in concurrent port 5308 (cfengine) sessions from a base > level of 0-4 to peaks in the 12-30 range, with the number of LWP's in > the cfservd processes tracking the number of connections linearly. > (Client systems are set to connect twice an hour with a 25-minute > 'splay time.) > 3. Running lockstat shows severe contention for a single adaptive > mutex: > > [EMAIL PROTECTED]:proc# lockstat sleep 5 > > Adaptive mutex spin: 157416 events in 5.040 seconds (31233 > events/sec) > Count indv cuml rcnt spin Lock Caller > > - ---------------------------------------------------------------------- > - --------- > 136805 87% 87% 1.00 75 0x152ec90 > sfmmu_mlist_enter+0x84 > [...] > Adaptive mutex block: 648 events in 5.040 seconds (129 events/sec) > Count indv cuml rcnt nsec Lock Caller > > - ---------------------------------------------------------------------- > - --------- > 547 84% 84% 1.00 391652 0x152ec90 > sfmmu_mlist_enter+0x84 > > Both of those types of lock run about 2 orders of magnitude lower in > total, with the specific lock running as much as 3 orders of > magnitude lower, (i.e. ~100 spins and no blocks) when the system is > in its 'calm' state. > > 4. The cfservd process becomes by far the top cpu user, eating 10-25% > of total cpu on a 4-processor system. > 5. The system retains some idle time (5-30%) but the time used by the > kernel jumps to the 40-70% range. > > The history of troubleshooting this leads me to believe that the > heavy ssh usage on this host is a significant compounding factor, > i.e. that we are hitting some common bottleneck when we have cfservd > accepting connections and are spawning batches of 30-100 outbound ssh > connections at once. Reducing the herds of outbound ssh's has reduced > the frequency and severity of these clog periods, but every time we > change much of anything on the system, we end up getting back to a > state where these clogs become common. > > > > _____________________________________________________________________ > Darryl Baker > gedas USA, Inc. > Operational Services Business Unit > 3800 Hamlin Road > Auburn Hills, MI 48326 > US > phone +1-248-754-5341 > fax +1-248-754-6399 > [EMAIL PROTECTED] > http://www.gedasusa.com > _____________________________________________________________________ > > > > -----BEGIN PGP SIGNATURE----- > Version: PGP Personal Security 7.0.3 > > iQA/AwUBQjX9Mle1Bhkj9lZeEQLTgQCeNHbP4+Zf+P2luqNx/QRNpLeOYF8AnRvL > BXCjcj0Rs4JDtgcQzjKv016V > =IHlF > -----END PGP SIGNATURE----- > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Work: +47 22453272 Email: [EMAIL PROTECTED] Fax : +47 22453205 WWW : http://www.iu.hio.no/~mark ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. _______________________________________________ Help-cfengine mailing list Help-cfengine@gnu.org http://lists.gnu.org/mailman/listinfo/help-cfengine