I have not fully read your explanations, but you staed that the
theoretical performance of raid 1 was 200% of a normal disk. That is
true in a "Multi User" environment, where 2 users are reading
simultaneously, and adding both of their read numbers together.
In a single user sequential read, you will get no better performance
that a normal disk, and I would suspect, that with overheads and such
for the raid device, you will have some degredation. 25% seems like
more than some though.
I have not fully examined the raid code, but it is also possible that
raid has double buffering. Meaning a cache buffer is allocated for the
MD device, and a cache buffer is allocated for the underlying block
device. This adds a memory copy into the mix and degrades performance.
Just food for thought. I'll probably know the answers to these
questions in the next month or so as I become more proficient in Linux
Raid.
Clay
[EMAIL PROTECTED] wrote:
>
> Tim Moore <[EMAIL PROTECTED]> writes with a number of great questions:
>
> > > The trouble I'm having is that my RAID-1 read performance appears
> > > substantially worse than just using the same disk partitions directly.
> >
> > How did you measure bonnie performance on the raw partitions?
>
> Sorry, the way I worded that was confusing. I didn't use the raw
> partitions; I just did a 'raidstop /dev/md0' and then used mke2fs
> to put filesystems on the ex-RAID partitions and then mounted them
> normally.
>
> > The only significant thing I can see is block reads @75% for this
> > particular sample.
>
> Exactly what concerns me. Given that the theoretical performance of RAID-1
> is 200% of one disk, 75% strikes me as a little low. (See the last set
> of bonnie runs in the message for more evidence on this.) As I'd really
> like to get this machine in to production, I may just settle for 75%,
> but I am worried that this is a sign of something else being wrong. And
> the extra performance would be swell, too.
>
> > > SMP kernel (I only have one 450 MHz PII, but the board has a second
> > > processor slot that I'll fill eventually). It has 128 MB RAM, and the
> > > disks in question are identical quantum disks on the same NCR SCSI bus.
> >
> > Why the SMP kernel when there aren't 2 active CPUs.?
>
> I plan to add one shortly. The docs claimed it wasn't a problem to run
> with a CPU missing, so this seemed like the easiest solution.
>
> > > mke2fs -b 1024 -i 1536 /dev/md0
> >
> > Why did you change bytes/inode to 1.5?
>
> Because the average size of the files I need to put on this partition is
> about 1600, and 1536 (1024+512) was nearby. With the default number of
> inodes, I'd run out before all the content was on. With a more normal
> number like 1024, then I would have too many inodes and not enough
> disk space.
>
> > Also read the /usr/doc info on calculating stride.
>
> The version I have doesn't mention anything useful in connection with
> RAID-1, only RAID-4/5, so I left it alone. I'd be glad to change this
> to any reasonable number, though.
>
> Tim also asked for a bunch of output, which I will provide here:
>
> > Please post output from fdisk -l /dev/sda /dev/sdb.
>
> Disk /dev/sda: 255 heads, 63 sectors, 1106 cylinders
> Units = cylinders of 16065 * 512 bytes
>
> Device Boot Start End Blocks Id System
> /dev/sda1 * 1 33 265041 83 Linux
> /dev/sda2 34 1106 8618872+ 5 Extended
> /dev/sda5 34 50 136521 82 Linux swap
> /dev/sda6 51 66 128488+ 83 Linux
> /dev/sda7 67 1106 8353768+ fd Unknown
>
> Disk /dev/sdb: 255 heads, 63 sectors, 1106 cylinders
> Units = cylinders of 16065 * 512 bytes
>
> Device Boot Start End Blocks Id System
> /dev/sdb1 * 1 48 385528+ 83 Linux
> /dev/sdb2 50 1106 8490352+ 5 Extended
> /dev/sdb3 49 49 8032+ 83 Linux
> /dev/sdb5 50 66 136521 82 Linux swap
> /dev/sdb6 67 1106 8353768+ fd Unknown
>
> > hdparm -tT /dev/md0 /dev/sda7 /dev/sdb6 /dev/md0 /dev/sda7 /dev/sdb6
>
> /dev/md0:
> Timing buffer-cache reads: 64 MB in 0.55 seconds =116.36 MB/sec
> Timing buffered disk reads: 32 MB in 2.88 seconds =11.11 MB/sec
>
> /dev/sda7:
> Timing buffer-cache reads: 64 MB in 0.49 seconds =130.61 MB/sec
> Timing buffered disk reads: 32 MB in 2.49 seconds =12.85 MB/sec
>
> /dev/sdb6:
> Timing buffer-cache reads: 64 MB in 0.49 seconds =130.61 MB/sec
> Timing buffered disk reads: 32 MB in 2.44 seconds =13.11 MB/sec
>
> /dev/md0:
> Timing buffer-cache reads: 64 MB in 0.57 seconds =112.28 MB/sec
> Timing buffered disk reads: 32 MB in 2.84 seconds =11.27 MB/sec
>
> /dev/sda7:
> Timing buffer-cache reads: 64 MB in 0.50 seconds =128.00 MB/sec
> Timing buffered disk reads: 32 MB in 2.42 seconds =13.22 MB/sec
>
> /dev/sdb6:
> Timing buffer-cache reads: 64 MB in 0.48 seconds =133.33 MB/sec
> Timing buffered disk reads: 32 MB in 2.42 seconds =13.22 MB/sec
>
> > [time the creation of 100 MB file]
>
> Here it is on the raid filesystem:
>
> [root@venus local]# time dd if=/dev/zero of=/usr/local/100MBtest bs=1k count=100k &&
>time dd if=/usr/local/100MBtest of=/dev/null bs=1k && time rm -rf /usr/local/100MBtest
> 102400+0 records in
> 102400+0 records out
> 0.070u 1.860s 0:05.79 33.3% 0+0k 0+0io 97pf+0w
> 102400+0 records in
> 102400+0 records out
> 0.130u 2.020s 0:09.82 21.8% 0+0k 0+0io 25691pf+0w
> 0.000u 0.100s 0:01.59 6.2% 0+0k 0+0io 88pf+0w
>
> And here it is on a regular filesystem:
> [root@venus local]# time dd if=/dev/zero of=//100MBtest bs=1k count=100k && time dd
>if=/100MBtest of=/dev/null bs=1k && time rm -rf /100MBtest
> 102400+0 records in
> 102400+0 records out
> 0.110u 1.700s 0:04.57 39.6% 0+0k 0+0io 91pf+0w
> 102400+0 records in
> 102400+0 records out
> 0.140u 1.840s 0:07.87 25.1% 0+0k 0+0io 25694pf+0w
> 0.000u 0.100s 0:02.11 4.7% 0+0k 0+0io 95pf+0w
>
> > Please post averaged results from several bonnie runs of 3x main memory.
>
> Sure thing. Here are the results from the 384 MB runs of bonnie on the
> RAID-1 device:
>
> -------Sequential Output-------- ---Sequential Input-- --Random--
> -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
> Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
> md0 384 5611 82.0 13586 20.5 4264 12.3 6032 86.2 8884 10.8 117.5 3.4
> md0 384 5539 81.0 13568 21.0 4335 12.5 6038 86.2 8897 11.0 115.7 3.3
> md0 384 5680 83.5 13584 20.6 4300 12.9 6025 86.6 8784 10.9 117.7 3.6
> avg 384 5610 82.2 13579 20.7 4300 12.6 6032 86.3 8855 10.9 117.0 3.4
>
> And here are the runs for the same disk partitions, reformatted and
> remounted as /pa and /pb:
>
> -------Sequential Output-------- ---Sequential Input-- --Random--
> -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
> Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
> pa 384 5991 85.3 13389 17.6 4305 11.2 6156 86.3 11954 12.5 100.8 2.4
> pa 384 6022 85.8 13326 17.9 4282 11.2 5884 82.7 11914 12.4 100.7 2.6
> pa 384 6079 87.0 13440 17.8 4312 11.1 5943 85.1 12123 14.0 101.2 2.3
> pb 384 6327 91.6 13490 18.1 4346 11.5 6135 86.7 12367 12.9 102.1 2.8
> pb 384 6124 88.5 13506 17.6 4351 11.5 6137 85.7 12410 12.8 103.4 2.7
> average 384 6109 87.6 13430 17.8 4319 11.3 6051 85.3 12154 12.9 102.0 2.6
>
> I also ran another test, which was interesting. I ran two copies of bonnie
> in parallel, one each on /pa and /pb, the same partitions that are used
> when I do the mkraid. Here are those performance numbers. Both bonnies
> started at the same time and finished within a second of each other:
>
> -------Sequential Output-------- ---Sequential Input-- --Random--
> -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
> Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
> pa 384 3312 47.9 10074 14.6 4463 12.6 3494 49.4 12235 13.2 77.6 2.2
> pb 384 3332 47.9 9320 12.9 4411 11.7 3468 48.9 11991 13.4 83.5 1.8
> sum 768 6644 95.8 19394 27.5 8874 24.3 6962 98.3 24226 26.6 161.1 4.0
>
> This pretty clearly suggests that the hardware is capable of a lot more
> than the RAID-1 is actually doing. I'd expect that RAID-1 block writes
> would be as slow as 10 MB/s (although they seem faster, perhaps because
> of the cache). But reads should at least be in the ballpark of 20-24 MB/s,
> shouldn't they? rather than the 9 MB/s I'm getting?
>
> Thanks again to everyone for their help! Please also note that I'd be
> content with the answer "current implementation is slow". I know I'm
> working with prerelease software here, so I'm grateful for what I
> can get.
>
> William