I have one node in a 4-node Oracle CRS cluster that is having trouble running Dtrace scripts. The scripts often abort due to the above error. Looking at traditional monitoring tools, this node does not seem to be under any more load than the other nodes, where Dtrace runs just fine.
I know of the work-arounds, but these CRS clusters crash easily, so I'd rather play it safe. Just trying to figure out what is going on with this particular server that it can't run Dtrace... Version of Solaris: Oracle Solaris 10 1/13 s10x_u11wos_24a X86 I've had the Dtrace Toolkit's procsystime abort. iotop and iosnoop run fine. I've had lockstat abort: $> lockstat -x aggsize=24m -kWP sleep 30 lockstat: dtrace_status(): Abort due to systemic unresponsiveness Here is a script called "inttimes.d", that aborts consistently: .... #pragma D option quiet #pragma D option dynvarsize=512m dtrace:::BEGIN { printf("Tracing... Hit Ctrl-C to end.\n"); } sdt:::interrupt-start { self->ts = vtimestamp; } sdt:::interrupt-complete /self->ts && arg0 != 0/ { this->devi = (struct dev_info *)arg0; /* this checks the pointer is valid, */ self->name = this->devi != 0 ? stringof(`devnamesp[this->devi->devi_major].dn_name) : "?"; this->inst = this->devi != 0 ? this->devi->devi_instance : 0; @num[self->name, this->inst] = sum(vtimestamp - self->ts); self->name = 0; } sdt:::interrupt-complete { self->ts = 0; } dtrace:::END { printf("%11s %16s\n", "DEVICE", "TIME (ns)"); printa("%10s%-3d %@16d\n", @num); } Any suggestions for debugging this? ------------------------------------------- dtrace-discuss Archives: https://www.listbox.com/member/archive/184261/=now RSS Feed: https://www.listbox.com/member/archive/rss/184261/25769126-e243886f Modify Your Subscription: https://www.listbox.com/member/?member_id=25769126&id_secret=25769126-8d47a7b2 Powered by Listbox: http://www.listbox.com