Is java running with the -server option on x86? There's a known bug where jstack() can't capture stacks On Nov 24, 2008, at 11:57 AM, Fernando Gleiser wrote:
> >> From time to time a jboss process would end eating all, the >> available CPU and the load avg would skyrocket. > > Once the operators restarted jboss, the system'd be normal again > (sometimes for weeks) until the next incident > > Since we moved the app from a v440 running Solaris 10 8/07 to a > t2000 running Solaris 10 5/08 the problem started to happen more > frequently (2-3 times a week). The only other modification (besides > hard and OS release) is we put the app server inside a sparse-zone > container with the FSS and a 95% limit on the CPU usage. > > Here's what uptime says: > > Mon Nov 24 10:10:00 ARST 2008 > 10:10am up 10 day(s), 14:40, 6 users, load average: 325.52, > 320.07, 318.72 > > (yes, load avg is almost 320, but the server was still usable) > > a little output from mpstat shows this: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr > sys wt idl > 0 0 0 531 259 150 46 27 17 43 0 29 99 > 1 0 0 > 1 0 0 3 66 0 124 65 36 10 0 86 99 > 1 0 0 > 2 0 0 0 23 0 34 22 17 9 0 8 100 > 0 0 0 > 3 0 0 0 13 0 13 12 6 9 0 1 100 > 0 0 0 > 4 0 0 0 13 0 14 12 9 9 0 2 100 > 0 0 0 > 5 0 0 0 38 0 69 37 17 8 0 23 100 > 0 0 0 > 6 0 0 0 25 0 40 24 13 8 0 7 95 > 5 0 0 > 7 0 0 0 15 0 17 14 10 9 0 3 100 > 0 0 0 > 8 0 0 0 11 0 10 10 6 8 0 0 100 > 0 0 0 > 9 0 0 0 12 0 11 11 7 9 0 0 100 > 0 0 0 > 10 0 0 0 14 0 13 13 7 9 0 0 100 > 0 0 0 > 11 0 0 0 13 0 12 12 6 9 0 0 100 > 0 0 0 > 12 0 0 0 25 0 37 24 14 8 0 25 100 > 0 0 0 > 13 0 0 0 13 0 13 12 9 9 0 1 100 > 0 0 0 > 14 0 0 0 13 0 12 12 4 9 0 0 100 > 0 0 0 > 15 0 0 0 13 0 12 12 7 9 0 0 100 > 0 0 0 > 16 0 0 0 12 0 11 11 5 7 0 0 100 > 0 0 0 > 17 0 0 0 14 0 13 13 8 7 0 0 100 > 0 0 0 > 18 0 0 0 13 0 12 12 8 9 0 0 100 > 0 0 0 > 19 0 0 1 13 0 12 12 7 8 0 0 100 > 0 0 0 > 20 0 0 7 25 0 35 25 13 9 0 3 100 > 0 0 0 > 21 0 0 72 45 3 78 41 29 11 0 19 99 > 1 0 0 > 22 0 0 3 44 4 71 39 21 11 0 135 100 > 0 0 0 > 23 0 0 3 20 5 16 14 7 9 0 2 100 > 0 0 0 > 24 0 0 3 17 5 13 12 6 9 0 2 99 > 1 0 0 > 25 0 0 1 15 4 10 10 8 8 0 0 100 > 0 0 0 > 26 0 0 8 15 0 17 14 5 10 0 0 100 > 0 0 0 > 27 0 0 0 12 0 11 11 7 8 0 0 100 > 0 0 0 > 28 0 0 1 11 0 10 10 6 8 0 0 100 > 0 0 0 > 29 0 0 2 67 0 126 66 30 11 0 48 100 > 0 0 0 > 30 0 0 0 19 0 25 18 11 8 0 232 100 > 0 0 0 > 31 0 0 1 27 0 47 26 12 10 0 10 100 > 0 0 0 > > > > So, 100% cpu in usr mode looks a lot like some kind of infinite loop > > I tried to dig it up using dtrace: > > profile-1001us > > /(execname=="java") && (pid==$1)/ > > { > > @[jstack()]=count(); > > } > > > > tick-20s > > { > > exit(O); > > } > > > But I got a LOT of these arrors: > > dtrace: error on enabled probe ID 1 (ID 52070: > profile:::Profile-1001us): invalid address (0x96a7e000) in action #2 > > and th stack traces are mostly the hex addresses without the names: > > > 0xf886ca64 > 0xf884cec0 > 0xf8805d3c > 0xf8805874 > 0xf8805c70 > 0xf8805c70 > 0xf8805d3c > 0xf88380b8 > 0xf9509440 > 0xf88b7b44 > 0xf8805874 > 0xf8805874 > 0xf8805764 > 0xf8805874 > 0xf8805874 > 0xf8805d3c > 0xf8805874 > 0xf8805874 > 0xf8805874 > 0xf8805874 > 0xf8805874 > 0xf8805874 > 0xf8805764 > 0xf8805764 > 0xf8805764 > 0xf886993c > 0xf9999004 > 0xf884cec0 > 0xf8805874 > 0xf8805d3c > 0xf884cc7c > 0xf994090c > 0xf8838df8 > 0xf8800218 > > libjvm > .so > ` > __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ > > +0x5a0 > libjvm.so`JVM_DoPrivileged+0x500 > > libjava > .so > ` > Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2 > > +0x14 > 0xf880e22c > 0xf880e1d0 > 0xf89c6470 > 0xf9dfc018 > 0xf99ee350 > 0xf9d74fd8 > 0xf8838df8 > 0xf8800218 > > libjvm > .so > ` > __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ > > +0x5a0 > libjvm.so`JVM_DoPrivileged+0x500 > > libjava > .so > ` > Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2 > > +0x14 > 0xf9467cc8 > 0xf9badd44 > 9852 > > > So my questions are: > > any ideas on what to do next? I'm pretty sure it's a jboss/ > application problem, but I need to get more data to show the jboss/ > devel people > what's causing those dtrace errors? > > > thanks in advance > > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss@opensolaris.org -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl _______________________________________________ dtrace-discuss mailing list dtrace-discuss@opensolaris.org