We have a database was very slow suddenly, when I used truss and I find the
pread syscall from oracle was slow (130+ ms). But the iostat and vxstat(we
used veritas QIO) show nothing abnormal. From vmstat we could see the sys
cpu is very high.

The problem happened after we encountered an error from in.mpathd. It hard
to believe it is in.mpathd that result in the problem. in.mpathd is a single
thread process (prstat -L), so it shouldn't use so much sys time. But it is
very likely in.mapthd was the root cause as we encounter the similar issue
on another database server, the database wasn't reponsive soon after
in.mpathd throw error.
Jan 14 09:12:53 sjcdb475 in.mpathd[413]: [ID 585766 daemon.error] Cannot
meet requested failure detection time of 10000 ms on (inet nxge0) new
failure detection time for group "mnic" is 89894 ms
Jan 14 09:13:53 sjcdb475 in.mpathd[413]: [ID 302819 daemon.error] Improved
failure detection time 44947 ms on (inet nxge3) for group "mnic"
Jan 14 09:13:53 sjcdb475 in.mpathd[413]: [ID 302819 daemon.error] Improved
failure detection time 22473 ms on (inet nxge0) for group "mnic"

On Tue, Jan 20, 2009 at 10:06 PM, Chad Mynhier <cmynh...@gmail.com> wrote:

> On Tue, Jan 20, 2009 at 6:23 AM, Chad Mynhier <cmynh...@gmail.com> wrote:
> >
> > If you don't care about the stack per se, and if it's available to you
> > (I don't know off the top of my head which version this went back
> > into), you could also just aggregate on the kernel function using
> > '@c[func(arg0)]'.
>
> To add a note to this:  even though you could do this, it's pretty
> unlikely that it's going to be informative.  You'll likely end up
> spending a lot of time in some utility function coming from a number
> of different code paths.  Here's some sample output:
>
>  genunix`rm_assize                                              3642
>  unix`lock_set                                                  3725
>  unix`mutex_exit                                                3922
>  genunix`sleepq_wakeone_chan                                    4454
>  unix`xc_loop                                                   5991
>  unix`disp_getwork                                              6755
>  unix`utl0                                                      6963
>  unix`fp_restore                                                7001
>  FJSV,SPARC64-VI`copyout                                        7204
>  unix`mutex_enter                                              12547
>
> It does you no good to know that you spent a lot of time in
> mutex_enter(), because you don't have any information as to why.
> Better information is more likely to be found deeper in the stack.
>
> Chad
>
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to