dtrace -n ':::xcalls { @s[stack()] = count() } tick-1sec { trunc(@s,10); printa(@s); clear(@s); }'
That will tell us where the xcalls are coming from in the kernel, and we can go from there. Thanks, /jim Jim Leonard wrote:
We have a 16-core x86 system that, at seemingly random intervals, will completely stop responding for several seconds. Running an mpstat 1 showed something horrifiying: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1004691 397 170 0 0 0 5 0 0 0 100 0 0 (rest of CPUs omitted) That's over a million cross-calls a second. Seeing them on CPU0 made me nervous that they were kernel-related, so I wrote a dtrace to print out xcalls per second aggregated by PID to see if a specific process was the culprit. Here's the output during another random system outage: 2009 Sep 22 12:51:49, load average: 5.90, 5.35, 5.39 xcalls: 637511 PID XCALLCOUNT 6164 15 6165 15 28339 26 0 637455 PID 0 is "sched" (aka the kernel). At this point I'm completely stumped as to what could be causing this. Any hints or ideas?
_______________________________________________ dtrace-discuss mailing list dtrace-discuss@opensolaris.org