Jim Mauro has provided an excellent starting point. Keep in mind that kernel 
threads will show up as pid 0 so you may be seeing a kernel thread
Causing the activity.

Jim L

----------Original Message----------

From: Jim Leonard <trix...@oldskool.org>
Sent: Tue, September 22, 2009 11:31 AM
To: dtrace-discuss@opensolaris.org
Subject: [dtrace-discuss] How to drill down cause of cross-calls in the kernel? 
(output provided)


We have a 16-core x86 system that, at seemingly random intervals, will 
completely stop responding for several seconds.  Running an mpstat 1 showed 
something horrifiying:

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 1004691 397 170 0 0 0 5 0 0 0 100 0 0
(rest of CPUs omitted)

That's over a million cross-calls a second.  Seeing them on CPU0 made me 
nervous that they were kernel-related, so I wrote a dtrace to print out xcalls 
per second aggregated by PID to see if a specific process was the culprit.  
Here's the output during another random system outage:

2009 Sep 22 12:51:49, load average: 5.90, 5.35, 5.39   xcalls: 637511

   PID                        XCALLCOUNT
   6164                                15
   6165                                15
   28339                               26
   0                               637455

PID 0 is "sched" (aka the kernel).

At this point I'm completely stumped as to what could be causing this.  Any 
hints or ideas?
-- 
This message posted from opensolaris.org
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to