dtrace -n ':::xcalls { @s[stack()] = count() } tick-1sec { trunc(@s,10); printa(@s); clear(@s); }'


That will tell us where the xcalls are coming from in the kernel,
and we can go from there.

Thanks,
/jim


Jim Leonard wrote:
We have a 16-core x86 system that, at seemingly random intervals, will 
completely stop responding for several seconds.  Running an mpstat 1 showed 
something horrifiying:

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 1004691 397 170 0 0 0 5 0 0 0 100 0 0
(rest of CPUs omitted)

That's over a million cross-calls a second.  Seeing them on CPU0 made me 
nervous that they were kernel-related, so I wrote a dtrace to print out xcalls 
per second aggregated by PID to see if a specific process was the culprit.  
Here's the output during another random system outage:

2009 Sep 22 12:51:49, load average: 5.90, 5.35, 5.39   xcalls: 637511

   PID                        XCALLCOUNT
   6164                                15
   6165                                15
   28339                               26
   0                               637455

PID 0 is "sched" (aka the kernel).

At this point I'm completely stumped as to what could be causing this.  Any 
hints or ideas?
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to