On Fri, Jul 30, 1999 at 02:01:19AM -0700, Tim Moore wrote:
> Take any drive and divide it up evenly into 6-8 partitions. Perform a
> simple dd(1) or hdparm(8) read test from each and watch the sustained
> speed drop by about 20% from the outside to the inside most partition.
>
> It's a 5400 RPM drive and will not deliver more than ~10MB/s.
The rotational speed is not the sole determining factor.
As the density per track goes up, if you keep the rotational speed
the same, the sustained throughput should go up.
This is why I think today's drives don't really need UDMA-66,
but tomorrow's drives will.
Here are some raid0 examples from my test system.
In summary, there's a fair amount of measurement variation, but I see
something like 9-15MB/s on a single drive, depending on which end
of the drive is being accessed. I get similar results from dd
and from hdparm -t. When I put raid0 on 8 of these drives,
I get 28-35MB/s accessing /dev/md*, depending on whether I stripe
the fast or slow end of the devices. But when I access a file
on a mounted filesystem on those same md* devices, I get 38MB/s to 60MB/s.
Some possible questions:
1) Why do I get about the same performance with 8 drives as with 6?
There is no evidence of CPU saturation. I changed from a p2/400
to a p3/450 (12.5% clock increase), and got maybe 3% improvement.
One hypothesis is that the overhead of copying from kernel to user
dominates. I will try to find out (e.g., try the raw io patches
or other hacks).
2) Why does reading a file go so much faster than reading /dev/md*?
The same thing does not seem true for /dev/hd*.
I still guess it's some difference in readahead, but have not verified
3) Why is raid0 on N disks faster than raid5 on N+1 disks for reading,
when there are no failures? This is the question I asked before,
and still seems to be open. I wonder if the answer is related to
my question 2.
Anyway, here is the boring stuff.
# ./hdparm -i /dev/hde
/dev/hde:
Model=ST317242A, FwRev=3.09, SerialNo=7BP0180T
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=33416/16/63, TrkSize=0, SectSize=0, ECCbytes=0
BuffType=0(?), BuffSize=512kB, MaxMultSect=16, MultSect=off
DblWordIO=no, maxPIO=2(fast), DMA=yes, maxDMA=2(fast)
CurCHS=33416/16/63, CurSects=33683328, LBA=yes, LBAsects=33683328
tDMA={min:120,rec:120}, DMA modes: mword0 mword1 mword2
IORDY=on/off, tPIO={min:240,w/IORDY:120}, PIO modes: mode3 mode4
UDMA modes: mode0 mode1 mode2 mode3 *mode4
Drive Supports : ATA-1 ATA-2 ATA-3 ATA-4
# fdisk -l /dev/hde
Disk /dev/hde: 16 heads, 63 sectors, 33416 cylinders
Units = cylinders of 1008 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hde1 1 2081 1048792+ 83 Linux
/dev/hde2 2082 31335 14744016 83 Linux
/dev/hde3 31336 33416 1048824 83 Linux
# time -f%e dd bs=32k count=32k of=/dev/null if=/dev/hde1
32768+0 records in
32768+0 records out
66.50
# echo scale=2\; 1024 / 66.5 | bc
15.39
# time -f%e dd bs=32k count=32k of=/dev/null if=/dev/hde3
32768+0 records in
32768+0 records out
111.42
# echo scale=2\; 1024 / 111.42 | bc
9.19
# ./hdparm -tT /dev/hde[13]
/dev/hde1:
Timing buffer-cache reads: 128 MB in 1.12 seconds =114.29 MB/sec
Timing buffered disk reads: 64 MB in 4.21 seconds =15.20 MB/sec
/dev/hde3:
Timing buffer-cache reads: 128 MB in 1.00 seconds =128.00 MB/sec
Timing buffered disk reads: 64 MB in 6.85 seconds = 9.34 MB/sec
Ok, so all that establishes the basic speed of a single disk, reading
both inner and outer cylinders, and it shows that hdparm -t reports
similar numbers to dd.
# cat /proc/mdstat
Personalities : [raid0]
read_ahead 1024 sectors
md2 : active raid0 hds3[7] hdq3[6] hdo3[5] hdm3[4] hdk3[3] hdi3[2] hdg3[1] hde3[0]
8389632 blocks 32k chunks
md1 : active raid0 hds2[7] hdq2[6] hdo2[5] hdm2[4] hdk2[3] hdi2[2] hdg2[1] hde2[0]
117951488 blocks 32k chunks
md0 : active raid0 hds1[7] hdq1[6] hdo1[5] hdm1[4] hdk1[3] hdi1[2] hdg1[1] hde1[0]
8389632 blocks 32k chunks
unused devices: <none>
# time -f%e dd bs=32k count=32k of=/dev/null if=/dev/md0
32768+0 records in
32768+0 records out
31.07
# echo scale=2\; 1024 / 31.07 | bc
32.95
# time -f%e dd bs=32k count=32k of=/dev/null if=/dev/md2
32768+0 records in
32768+0 records out
36.16
# echo scale=2\; 1024 / 36.16 | bc
28.31
# ./hdparm -t /dev/md[02]
/dev/md0:
Timing buffered disk reads: 64 MB in 1.83 seconds =34.97 MB/sec
/dev/md2:
Timing buffered disk reads: 64 MB in 2.26 seconds =28.32 MB/sec
So all that establishes that md gives me only 28-35MB/s when reading
directly. Now, let's look at file performance.
# mount
/dev/hda2 on / type ext2 (rw)
none on /proc type proc (rw)
none on /dev/pts type devpts (rw,mode=0622)
/dev/md0 on /mnt/mnt type ext2 (rw)
/dev/md2 on /mnt/mnt2 type ext2 (rw)
# time -f%e dd bs=32k count=32k of=/dev/null if=/mnt/mnt/junkusr.tar
32768+0 records in
32768+0 records out
20.70
# echo scale=2\; 1024 / 20.70 | bc
49.46
# time -f%e dd bs=32k count=32k of=/dev/null if=/mnt/mnt2/junkusr.tar
32768+0 records in
32768+0 records out
17.12
# echo scale=2\; 1024 / 17.12 | bc
59.81
That shows that file performance is much higher, but also shows
curiously that the filesystem on /dev/md2 is the faster one,
even though it is on the slower cylinders, as indicated by the
/dev/hde[13] tests above. Strange.
Jan Edler
NEC Research Institute