I have one node in a 4-node Oracle CRS cluster that is having trouble
running Dtrace scripts. The scripts often abort due to the above error.
Looking at traditional monitoring tools, this node does not seem to be
under any more load than the other nodes, where Dtrace runs just fine.

I know of the work-arounds, but these CRS clusters crash easily, so I'd
rather play it safe.  Just trying to figure out what is going on with this
particular server that it can't run Dtrace...

Version of Solaris: Oracle Solaris 10 1/13 s10x_u11wos_24a X86

I've had the Dtrace Toolkit's procsystime abort.  iotop and iosnoop run
fine.

I've had lockstat abort:

$> lockstat -x aggsize=24m -kWP sleep 30
lockstat: dtrace_status(): Abort due to systemic unresponsiveness

Here is a script called "inttimes.d", that aborts consistently:
 ....
#pragma D option quiet
#pragma D option dynvarsize=512m
dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}
sdt:::interrupt-start
{
        self->ts = vtimestamp;
}

sdt:::interrupt-complete
/self->ts && arg0 != 0/
{
        this->devi = (struct dev_info *)arg0;
        /* this checks the pointer is valid, */
        self->name = this->devi != 0 ?
            stringof(`devnamesp[this->devi->devi_major].dn_name) : "?";
        this->inst = this->devi != 0 ? this->devi->devi_instance : 0;
        @num[self->name, this->inst] = sum(vtimestamp - self->ts);
        self->name = 0;
}
sdt:::interrupt-complete
{
        self->ts = 0;
}
dtrace:::END
{
        printf("%11s    %16s\n", "DEVICE", "TIME (ns)");
        printa("%10s%-3d %@16d\n", @num);
}
Any suggestions for debugging this?



-------------------------------------------
dtrace-discuss
Archives: https://www.listbox.com/member/archive/184261/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184261/25769126-e243886f
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769126&id_secret=25769126-8d47a7b2
Powered by Listbox: http://www.listbox.com

Reply via email to