-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 My master machine is Solaris 9 and all systems are running Solaris 8 or 9 and cfengine 2.1.13.
The problem we have with cfservd manifests itself as a periodic clog that takes about a minute to resolve. This period is characterized by the following symptoms: 1. Load average spike from ~3 (on a 4-processor system) to the 6-8 range. Occasionally the spike breaks into double digits. 2. Increase in concurrent port 5308 (cfengine) sessions from a base level of 0-4 to peaks in the 12-30 range, with the number of LWP's in the cfservd processes tracking the number of connections linearly. (Client systems are set to connect twice an hour with a 25-minute 'splay time.) 3. Running lockstat shows severe contention for a single adaptive mutex: [EMAIL PROTECTED]:proc# lockstat sleep 5 Adaptive mutex spin: 157416 events in 5.040 seconds (31233 events/sec) Count indv cuml rcnt spin Lock Caller - ---------------------------------------------------------------------- - --------- 136805 87% 87% 1.00 75 0x152ec90 sfmmu_mlist_enter+0x84 [...] Adaptive mutex block: 648 events in 5.040 seconds (129 events/sec) Count indv cuml rcnt nsec Lock Caller - ---------------------------------------------------------------------- - --------- 547 84% 84% 1.00 391652 0x152ec90 sfmmu_mlist_enter+0x84 Both of those types of lock run about 2 orders of magnitude lower in total, with the specific lock running as much as 3 orders of magnitude lower, (i.e. ~100 spins and no blocks) when the system is in its 'calm' state. 4. The cfservd process becomes by far the top cpu user, eating 10-25% of total cpu on a 4-processor system. 5. The system retains some idle time (5-30%) but the time used by the kernel jumps to the 40-70% range. The history of troubleshooting this leads me to believe that the heavy ssh usage on this host is a significant compounding factor, i.e. that we are hitting some common bottleneck when we have cfservd accepting connections and are spawning batches of 30-100 outbound ssh connections at once. Reducing the herds of outbound ssh's has reduced the frequency and severity of these clog periods, but every time we change much of anything on the system, we end up getting back to a state where these clogs become common. _____________________________________________________________________ Darryl Baker gedas USA, Inc. Operational Services Business Unit 3800 Hamlin Road Auburn Hills, MI 48326 US phone +1-248-754-5341 fax +1-248-754-6399 [EMAIL PROTECTED] http://www.gedasusa.com _____________________________________________________________________ -----BEGIN PGP SIGNATURE----- Version: PGP Personal Security 7.0.3 iQA/AwUBQjX9Mle1Bhkj9lZeEQLTgQCeNHbP4+Zf+P2luqNx/QRNpLeOYF8AnRvL BXCjcj0Rs4JDtgcQzjKv016V =IHlF -----END PGP SIGNATURE-----
PGPexch.rtf.asc
Description: Binary data
_______________________________________________ Help-cfengine mailing list Help-cfengine@gnu.org http://lists.gnu.org/mailman/listinfo/help-cfengine