shengliang zhang wrote:
I want to trace the process of building the device tree in Solaris.
How to implement this use dtrace?
Firstly this is an OpenSolaris forum so if your question relates to
Solaris 10 then use internal channels. Hint: If you're not on Solaris
10u6 or higher then you really need to be. There were several
put-back's in to S10u6 specifically around libdevinfo, specifically
mt_config_fini(), which greatly improved performance in this area.
Secondly, how deep in to the mechanics do you want to go? Is there a
problem or are you just curious about the mechanics?
I'll give you a starting place and see how you get on.....
Lets assume you want to see what happens to the device tree using
"devfsadm -Cv". You can actually learn a lot using truss - "truss
-leafdDE -o /var/tmp/truss.devfsadm-Cv.out devfsadm -Cv". The more
interesting bit is this:
PID/TID -d -D -E Command = return
------- ------ ------ ------
--------------------------------------------------------
(...snip...)
1919/1: 0.1080 0.0002 0.0000
open("/devices/pseudo/devi...@0:devinfo", O_RDONLY) = 6
1919/1: 0.1082 0.0002 0.0000 ioctl(6, DINFOIDENT,
0x00000000) = 57311
1919/1: 0.3840 0.2758 0.0077 ioctl(6, 0x20DF2F,
0x080473F0) = 155964
1919/1: 0.3842 0.0002 0.0000
brk(0x080EC110) = 0
1919/1: 0.3842 0.0000 0.0000
brk(0x080FE110) = 0
1919/1: 0.3848 0.0006 0.0004 ioctl(6, DINFOUSRLD,
0x080D6000) = 159744
1919/1: 0.3848 0.0000 0.0000
close(6) = 0
This is where we take a snapshot of the device tree, load it in to
user-land and then manipulate it. Note in some older releases of
OpenSolaris and Solaris 10 you may see the following instead:
16456/1: 0.2428 0.0015 0.0001
open("/devices/pseudo/devi...@0:devinfo", O_RDONLY) = 3
16456/1: 0.2908 0.0480 0.0016 ioctl(3, *0xDF81*, 0xFFBFF688)
= 0
16456/1: 0.2911 0.0003 0.0000 close(3)
= 0
16456/1: 0.2914 0.0003 0.0001
open("/devices/pseudo/devi...@0:devinfo", O_RDONLY) = 3
16456/1: 0.2916 0.0002 0.0000 ioctl(3, 0xDF82, 0x00000000)
= 57311
16456/1: 0.4522 0.1606 0.0077 ioctl(3, 0xDF2F, 0xFFBFF214)
= 118056
16456/1: 0.4525 0.0003 0.0000 brk(0x00102000)
= 0
16456/1: 0.4530 0.0005 0.0002 ioctl(3, *0xDF80*, 0x000E2000)
= 122880
16456/1: 0.4532 0.0002 0.0000 close(3)
= 0
Refer to
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/devinfo_impl.h
for info on the 2nd argument to the ioctl() calls.
46 #define DIIOC (0xdf<<8)
73 #define DINFOUSRLD (DIIOC | 0x80) /* copy snapshot to
usrland */
74 #define DINFOLODRV (DIIOC | 0x81) /* force load a driver */
75 #define DINFOIDENT (DIIOC | 0x82) /* identify the driver */
Note 0xDF2F is an OR of the following:
60 #define DINFOFORCE (DIIOC | 0x20) /* force load all
drivers */
53 #define DINFOSUBTREE (DIIOC | 0x01) /* include subtree */
54 #define DINFOMINOR (DIIOC | 0x02) /* include minor data */
55 #define DINFOPROP (DIIOC | 0x04) /* include properties */
56 #define DINFOPATH (DIIOC | 0x08) /* include i/o pathing
information */
Now go and read the libdevinfo(3LIB) man page. This tells you most of
what you want you need to know before delving deeper with DTrace. Also
read the source code as it's reasonably well commented. In OSol and
latish releases of S10 you should have the fbt:devinfo:: probes. Find
out what you've got to play with by listing them:
$ dtrace -ln 'fbt:devinfo::'
If, for example, you wanted to see the flow of all the ioctl and devinfo
calls that devfsadm calls during its execution use:
$ pfexec dtrace -F -o /var/tmp/dtrace.devfsadm-Cv.out \
-n 'syscall::ioctl:entry /execname == "devfsadm"/ {printf("->
%s(%lx,%lx,%lx,%lx,%lx)\n", probefunc,arg0,arg1,arg2,arg3,arg4);
self->interested=1;}' \
-n 'syscall::ioctl:return /self->interested/ {}' \
-n 'fbt:devinfo:: /self->interested/ {}' \
-n 'dtrace:::END {self->interest=0;}' \
-c "devfsadm -Cv"
That output, even on a small system, is going to be quite large so you'd
be better to do some aggregation to get an idea of where you want to go
next. Some nice quantize() graphs on the delta between entry and return
probes would be a good idea. You can then drill down on the ones which
take the longest. The graphs also give you an idea of what values you
need when creating the predicate on the delta in the return probe
ie:
Provider:Module:Function:return
/delta >= some_value/
{do something useful}
Most people are only interested in this subject if devfsadm is taking a
long time. If this is you, chances are the time will be consumed in one
or both of the following places:
3737/1: 0.2940 0.0049 0.0011 open("/devices/pseudo/devi...@0:devinfo",
O_RDONLY) = 3
*3737/1: 4.0493 3.7553 0.0018 ioctl(3, 0xDF81, 0xFFBFF5C0) = 0 <<<<*
3737/1: 4.0496 0.0003 0.0000 close(3) = 0
3737/1: 4.0499 0.0003 0.0001 open("/devices/pseudo/devi...@0:devinfo",
O_RDONLY) = 3
3737/1: 4.0502 0.0003 0.0000 ioctl(3, 0xDF82, 0x00000000) = 57311
*3737/1: 7.9125 3.8623 0.0092 ioctl(3, 0xDF2F, 0xFFBFF14C) = 125472 **<<<<*
Note that the elapsed time of the ioctl call is very short, but the
delta is high. This is because the system call was kicked off CPU
waiting for a event to happen. The event is not the ioctl itself but
something which happens as a result of the ioctl, typically somewhere
within one of the drivers. Using "on-off-cpu.d" you'll be able to get
the stack which caused the thread to be kicked off cpu. From that you
can then include further probes and predicates to look deeper. You'll
most likely find that this thread is waiting on a
mt_config_fini()->cv_wait(). The condition variable is issued because
we have one or more config_grand_children()->mt_config_fini() threads
that need to complete. You'll need a threadlist (or crash dump) to
determine how many threads we're waiting for and why. Hint: use the
pstop(), system(), and prun() commands within your D script to allow you
to collect the data you want at the appropriate time. The original
thread cannot continue until all it's grand children have completed (ie:
mtc_thr_count is zero). This is why it's kicked off cpu.
The most common cause for devfsadm taking a long time is when systems
have large numbers disk device targets configured, but not used. The
reason is that for each target the driver will send an SCSI INQUIRY
packet which has a 250millisecond timeout. It doesn't take long to rack
up multiple seconds, or even minutes, of delay. The easiest way to see
this particular case in action is using scsi.d:
$ ./scsi.d-1.18 mpt 1
Hit Control C to interrupt
00007.657554003 mpt1:-> 0x12 INQUIRY address 00:00, lba 0x000000, len
0x000030, control 0x00 timeout 60 CDBP 60028113348 1 sched(0) cdb(6) 120000003000
00007.907823387 mpt1:<- 0x12 INQUIRY address 00:00, lba 0x000000, len
0x000030, control 0x00 timeout 60 CDBP 60028113348, reason 0x1 (INCOMPLETE)
pkt_state 0x1 state 0x0 Success *Time 250390us*
00007.907969358 mpt1:-> 0x12 INQUIRY address 01:00, lba 0x000000, len
0x000030, control 0x00 timeout 60 CDBP 60028113348 1 sched(0) cdb(6) 120000003000
00008.158201728 mpt1:<- 0x12 INQUIRY address 01:00, lba 0x000000, len
0x000030, control 0x00 timeout 60 CDBP 60028113348, reason 0x1 (INCOMPLETE)
pkt_state 0x1 state 0x0 Success *Time 250321us*
00008.158333640 mpt1:-> 0x12 INQUIRY address 02:00, lba 0x000000, len
0x000030, control 0x00 timeout 60 CDBP 60028113348 1 sched(0) cdb(6) 120000003000
00008.408558541 mpt1:<- 0x12 INQUIRY address 02:00, lba 0x000000, len
0x000030, control 0x00 timeout 60 CDBP 60028113348, reason 0x1 (INCOMPLETE)
pkt_state 0x1 state 0x0 Success *Time 250313us*
(...snip...)
If this is you, then reduce the number of available targets configured
in the appropriate driver (sd.conf, ssd.conf, mpt.conf, etc)
You should have enough info now to go diving in to devinfo. Best of luck.
HTH
Steve
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org