We have a database was very slow suddenly, when I used truss and I find the pread syscall from oracle was slow (130+ ms). But the iostat and vxstat(we used veritas QIO) show nothing abnormal. From vmstat we could see the sys cpu is very high.
The problem happened after we encountered an error from in.mpathd. It hard to believe it is in.mpathd that result in the problem. in.mpathd is a single thread process (prstat -L), so it shouldn't use so much sys time. But it is very likely in.mapthd was the root cause as we encounter the similar issue on another database server, the database wasn't reponsive soon after in.mpathd throw error. Jan 14 09:12:53 sjcdb475 in.mpathd[413]: [ID 585766 daemon.error] Cannot meet requested failure detection time of 10000 ms on (inet nxge0) new failure detection time for group "mnic" is 89894 ms Jan 14 09:13:53 sjcdb475 in.mpathd[413]: [ID 302819 daemon.error] Improved failure detection time 44947 ms on (inet nxge3) for group "mnic" Jan 14 09:13:53 sjcdb475 in.mpathd[413]: [ID 302819 daemon.error] Improved failure detection time 22473 ms on (inet nxge0) for group "mnic" On Tue, Jan 20, 2009 at 10:06 PM, Chad Mynhier <cmynh...@gmail.com> wrote: > On Tue, Jan 20, 2009 at 6:23 AM, Chad Mynhier <cmynh...@gmail.com> wrote: > > > > If you don't care about the stack per se, and if it's available to you > > (I don't know off the top of my head which version this went back > > into), you could also just aggregate on the kernel function using > > '@c[func(arg0)]'. > > To add a note to this: even though you could do this, it's pretty > unlikely that it's going to be informative. You'll likely end up > spending a lot of time in some utility function coming from a number > of different code paths. Here's some sample output: > > genunix`rm_assize 3642 > unix`lock_set 3725 > unix`mutex_exit 3922 > genunix`sleepq_wakeone_chan 4454 > unix`xc_loop 5991 > unix`disp_getwork 6755 > unix`utl0 6963 > unix`fp_restore 7001 > FJSV,SPARC64-VI`copyout 7204 > unix`mutex_enter 12547 > > It does you no good to know that you spent a lot of time in > mutex_enter(), because you don't have any information as to why. > Better information is more likely to be found deeper in the stack. > > Chad >
_______________________________________________ dtrace-discuss mailing list dtrace-discuss@opensolaris.org