I think I've found some code in /usr/src/uts/common/io/lvm/raid/raid.c which 
purports not to read parity back from the disk when a "full line" (?) write is 
done, instead calculating the new parity stripe for the line from the full line 
of data that is to be written.  But I'm not sure what conditions are required 
to make this happen...

For instance, on my test system:
maxphys = 1024k, set in /etc/system and then checked after reboot with mdb
d0 is a RAID-5 volume with 4 slices, all identical geometry, interlace=64k

if I try something like

-bash-3.00# time dd if=/dev/zero of=/dev/md/rdsk/d0 bs=1024k count=1024

and simultaneously check I/O stats:

-bash-3.00# iostat -xnz 2

the sustained numbers look like this:

                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   96.0  190.5 1413.3 12239.6  0.0  0.8    0.1    2.8   2  62 c1d0
   95.0  190.0 1349.3 12207.6  0.0  0.9    0.0    3.2   0  64 c1d1
   85.0  169.5  709.3 10890.3  0.0  0.7    0.0    2.8   0  58 c2d0
   84.0  169.0  677.0 10858.3  0.0  0.6    0.0    2.4   0  52 c2d1
    0.0   16.0    0.0 16384.1  0.0  1.0    0.0   61.4   0  98 d0

and the wall time for the whole "dd" execution bears out the 16MB/sec average 
write speed to d0.

This sure looks to me like even with big 1MB writes to a raw device, they are 
all getting chopped up into 64k pieces and written with read-modify-write 
rather than whole stripe at a time.  Is this the way it's meant to work?

Puzzled,
   Jason =:^/
This message posted from opensolaris.org

Reply via email to