Hi Paul - > > One thing I have been puzzled with a lot this weekend is the information and > plot in Figure 4.7. This section if I understand it correctly, offers the > means to track the actual times from when an IO starts in the kernel to when > it completes, implying the time to either read or write from disk or memory > cache. > > I have been using a data file for an Oracle database as the test subject for > this work. > > I have several confusion points with this section. > > • The graph mentions ‘strategy’ and ‘biodone’ which seem to imply TNF > based data, not output from the DTrace script above Figure 4.7. >
The DTrace script uses the IO provider io:::start and io:::done probes. io:genunix::start enables several probes, including the bdev_strategy kernel routine; nv98> dtrace -n 'io:genunix::start' dtrace: description 'io:genunix::start' matched 3 probes CPU ID FUNCTION:NAME 0 22920 bdev_strategy:start 0 22920 bdev_strategy:start 0 22920 bdev_strategy:start io:genunix::done enables a probe in biodone; nv98> dtrace -n 'io:genunix::done' dtrace: description 'io:genunix::done' matched 1 probe CPU ID FUNCTION:NAME 0 22908 biodone:done The actions in the dtrace script gather timestamps and block numbers at each firing (assuming the predicates evaluate true). > • In looking at the data gotten from the DTrace script I see no way to > generate the graph of Figure 4.7. Specifically the time difference between > ‘0’ and the points for ‘strategy’. With the DTrace script we have the start > time of the IO. I see no way to determine some amount of time between ‘start’ > and something earlier. The time values on the ‘X’ axis also don’t fall out of > the data generated by the DTrace script. > The script in the book generates output that looks like this; nv98> ./iot.d 122065039977,2100, 122065039980,,2100 122070310632,72, 122070310637,,72 122070310833,81, 122070310836,,81 . . . The value on the left is the timestamp, the value on the right is the block number. The data was imported into a spreadsheet, and the math was done on the start time and stop (done) time for each block, resulting in IO times on a per-block basis. > • How can it be determined if the IO completed from a memory cache or > required an I/O to a physical disk? I have a lot of times less than 0.5 ms > but also have a fair number that are in the range of 1 ms to 300 ms. > The lookup in the file system memory cache happens above the bio layer. Put another way, if we're hitting bdev_strategy, we need to do a disk IO to get the data (we already missed in the cache). HTH, Thanks, /jim > I modified the script to dump out the size of the I/O being done and that was > interesting to see some unexpected lengths. Also added ‘start’ and ‘end’ to > the appropriate lines as a sanity check to make it easier to pair up the > entries. Should always have one start/end pair for a block address. > > I have attached an Excel spreadsheet with what I was able to create based on > the data collected. > > My thanks for any clarifications to these confusions. > > pdc > > ------------------------------------------------------------------------ > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss@opensolaris.org _______________________________________________ dtrace-discuss mailing list dtrace-discuss@opensolaris.org