I mentioned this idea a few weeks ago on this list: namely to allow a sg pass-through request to use the mmap-ed reserve buffer associated with another sg file descriptor.
In my experience mmap-ed IO using sg's reserve buffer mapped
into the user space is faster than direct IO schemes. However
one shortcoming is that if you try to copy between two devices
using this technique then you end up with two separate mmap-ed
buffers in the user space program. Then the user space program
needs to copy between the two buffers which would defeat much
of the advantage of the mmap-ed IO. You could (and sgm_dd in
sg3_utils does) use mmap-ed IO on the read side and direct IO
on the write side (or vice versa).
I used the sg driver as found in lk 2.6.21-rc4 as a baseline
(and I don't think sg has changed since 2.6.19). A gzipped
diff is attached. There is also some test code (a modified
sgm_dd) in the sg3_utils-1.24 beta on the www.torque.net/sg site.
Here is an example of a disk to disk copy:
sgm_dd if=/dev/sg0 of=/dev/sg1 oflag=smmap bs=512
The new flag is 'oflag=smmap' which instructs the write SG_IO
on /dev/sg1 to set SG_FLAG_SHARED_MMAP_IO and it passes
the mmap-ed buffer used for /dev/sg0 in dxferp. [Add a
'verbose=1' option and it will indicate how many times shared
mmap IO was requested and how many times it was actually done.]
Features:
- allow both side of a copy like operation to dma into
and out of the same user space buffer
- minimal per command overhead (i.e. building of
scatter gather lists and pinning pages)
- could copy a single source to multiple destinations
efficiently
- if shared reserve buffer unavailable (or not big
enough) then fall back to indirect IO transparently
- new info bit SG_INFO_SHARED_MMAP_IO indicates whether
shared mmap-ed IO was done
Restrictions (enforced by the sg driver):
- confined to file descriptors in the same process
- there can be only one user of a reserve buffer
at a time
- low_dma is honoured
Complexity
- it does have a few more corner cases than usual. For
example in above sgm_dd invocation: closing /dev/sg0
while /dev/sg1 is sharing its mmap-ed reserve buffer ...
Here are some timings copying between two ramdisks. It is
assumed the 'bs=8k' given to dd is equivalent to 'bs=512
bpt=16' given to sgm_dd.
# lsscsi -g
[4:0:0:0] disk Linux scsi_debug 1.82 /dev/sda /dev/sg0
[5:0:0:0] disk Linux scsi_ses 1.06 /dev/sdb /dev/sg1
# ./dd_tsts.sh
Usage: dd_tsts.sh <ifile> <ofile> <times> <bs>
# ./dd_tsts.sh /dev/sda /dev/sdb 50 8k
Indirect IO with dd
dd if=/dev/sda of=/dev/sdb bs=8k
real 0m7.448s
user 0m0.080s
sys 0m7.046s
Direct IO with dd
dd if=/dev/sda iflag=direct of=/dev/sdb oflag=direct bs=8k
real 0m4.529s
user 0m0.114s
sys 0m3.799s
# ./sg_dd_tsts.sh /dev/sg0 /dev/sg1 50 16
Indirect IO with sg_dd
sg_dd if=/dev/sg0 of=/dev/sg1 bs=512 bpt=16
real 0m6.304s
user 0m0.171s
sys 0m5.268s
Direct IO with sg_dd
sg_dd if=/dev/sg0 iflag=dio of=/dev/sg1 oflag=dio bs=512 bpt=16
real 0m4.246s
user 0m0.135s
sys 0m3.395s
Mmap read, indirect IO write with sgm_dd
sgm_dd if=/dev/sg0 of=/dev/sg1 bs=512 bpt=16
real 0m4.023s
user 0m0.127s
sys 0m3.259s
Mmap read, direct IO write with sgm_dd
sgm_dd if=/dev/sg0 of=/dev/sg1 oflag=dio bs=512 bpt=16
real 0m4.057s
user 0m0.164s
sys 0m3.264s
Mmap read, shared mmap write with sgm_dd
sgm_dd if=/dev/sg0 of=/dev/sg1 oflag=smmap bs=512 bpt=16
real 0m3.871s
user 0m0.131s
sys 0m3.111s
Don't expect drastic improvements in real IO unless it is
in the gigabyte per second range.
Doug Gilbert
sg2621rc4smm2.diff.gz
Description: GNU Zip compressed data

