Hi Paul -
>
> One thing I have been puzzled with a lot this weekend is the information and 
> plot in Figure 4.7. This section if I understand it correctly, offers the 
> means to track the actual times from when an IO starts in the kernel to when 
> it completes, implying the time to either read or write from disk or memory 
> cache. 
>
> I have been using a data file for an Oracle database as the test subject for 
> this work.
>
> I have several confusion points with this section.
>
> •     The graph mentions ‘strategy’ and ‘biodone’ which seem to imply TNF 
> based data, not output from the DTrace script above Figure 4.7. 
>   

The DTrace script uses the IO provider io:::start and io:::done probes.
io:genunix::start enables several probes, including the bdev_strategy 
kernel routine;

nv98> dtrace -n 'io:genunix::start'
dtrace: description 'io:genunix::start' matched 3 probes

CPU ID FUNCTION:NAME
0 22920 bdev_strategy:start
0 22920 bdev_strategy:start
0 22920 bdev_strategy:start

io:genunix::done enables a probe in biodone;

nv98> dtrace -n 'io:genunix::done'
dtrace: description 'io:genunix::done' matched 1 probe
CPU ID FUNCTION:NAME
0 22908 biodone:done


The actions in the dtrace script gather timestamps and block numbers at 
each firing
(assuming the predicates evaluate true).

> •     In looking at the data gotten from the DTrace script I see no way to 
> generate the graph of Figure 4.7. Specifically the time difference between 
> ‘0’ and the points for ‘strategy’. With the DTrace script we have the start 
> time of the IO. I see no way to determine some amount of time between ‘start’ 
> and something earlier. The time values on the ‘X’ axis also don’t fall out of 
> the data generated by the DTrace script.
>   
The script in the book generates output that looks like this;
nv98> ./iot.d
122065039977,2100,
122065039980,,2100
122070310632,72,
122070310637,,72
122070310833,81,
122070310836,,81
. . .

The value on the left is the timestamp, the value on the right is the 
block number.
The data was imported into a spreadsheet, and the math was done on the 
start time
and stop (done) time for each block, resulting in IO times on a 
per-block basis.

> •     How can it be determined if the IO completed from a memory cache or 
> required an I/O to a physical disk? I have a lot of times less than 0.5 ms 
> but also have a fair number that are in the range of 1 ms to 300 ms.
>   

The lookup in the file system memory cache happens above the bio layer. 
Put another way,
if we're hitting bdev_strategy, we need to do a disk IO to get the data 
(we already missed
in the cache).

HTH,

Thanks,
/jim


> I modified the script to dump out the size of the I/O being done and that was 
> interesting to see some unexpected lengths.  Also added ‘start’ and ‘end’ to 
> the appropriate lines as a sanity check to make it easier to pair up the 
> entries. Should always have one start/end pair for a block address.
>
> I have attached an Excel spreadsheet with what I was able to create based on 
> the data collected.
>
> My thanks for any clarifications to these confusions.
>
> pdc
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss@opensolaris.org
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to