> People are talking about performance at about 20MB/s using UDMA, but
> mine SCSI's are so slow!
I'll take some (educated?) guesses.
> * box one: Linux 2.0.35 (have been running for about 10 months):
> 128MB RAM, Pentium 233
1) seems like this would be a very old kernel patch... back from the days
when Ingo admitted the raid5 overhead was poor. *shrug*
> Adaptec 2940WU
> software RAID5=3 x ST3455N (7200rpm, Ultra-SCSI)+ 1 IBM DCAS-34330W
> (5400rpm, wide), 16k chunk size
2) I've seen my own performance get better at 4k chunk size
3) any raid (in my experience) will get throttled by the slowest individual
drive (makes sense to me, as we can't have the 5400 rpm drive queueing
up thousands of commands and we're writing 10 MB later on the others.
Perhaps try a) taking out the 5400 and (optionally) replacing with a
7200 (then rebuilding). My performance jumped a good bit between
7200 and 10krpm drives, so I'd imagine 5400 and 7200 is big as well.
> 2940WU also connects 1 HP C1533A and 1 EXABYTE EXB-8505 tape drives, in
> addition to the 4 Disks.
4) Could be that the other devices are forcing the scsi bus down to
a slower speed. IIRC the 2940's device driver should print out
what speeds it gets to each device (recent versions say MB/sec
where previous would say 20 MHz, 16-bit instead of just 40MB/sec).
What's the driver say?
> bonnie says:
5) not that it matters, but try removing the first 9 chars of each
line in a bonnie run to help prevent wrapping. I'll do it here.
> -------Sequential Output-------- ---Sequential Input-- --Random--
> -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
> MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
> 100 5610 46.4 6703 13.4 2594 11.7 7012 47.1 8485 14.4 150.9 4.0
6) although I'd love to hear someone give a better explanation, I've always
been told to run bonnie with a size at least twice main memory, so in
your case "bonnie -s 256" (of course, to help counteract distance from
the drives, you typically want a lot more CPU power than the processing
on an available h/w raid device).
> sync;date;if=/dev/md0 of=/dev/zero bs=1024k count=1024;date
> gives 7.5MB/s
> sync;date;if=/dev/zero of=/tmp/1gb bs=1024k count=1024;date
> gives 4.9MB/s
7) As was pointed out to me, a last sync before date is really needed
to help ensure accuracy, but these correlate with what's expected
given your bonnie results.
> * box two: SCO OpenServer (can't replace it with Linux because we have
> Informix 4GL running):
> 96MB RAM, Penium 200MMX
> AcceleRAID 150
> hardware RAID5=4 x Quantum Atlas 10K (10000rpm)
8) Due to the possible use of MMX instructions in s/w raid, a processor
swap might be in order :) Of course, I'm not sure if that's the case
for the raid version you're running on the 2.0.35 kernel.
9) *of course* 10krpm drives are going to whip 3 7200's and a 5400 :)
Just about any raid is going to be bound in the drives. I'm actually
a little surprised it only pulled 5.8 MB/sec on the dd write.
> sync;date;if=/dev/p2d4 of=/dev/null bs=1024k count=1024;date
> gives 15MB/s
> sync;date;if=/dev/zero of=/tmp/1gb bs=1024k count=1024;date
> gives 5.8MB/s
>
> 2 x 256MB ECC& registered SDRAM and 1 Pentium III 450 will soon replace
> this testing machine for SCO as a production DBMS server.
>
> AcreAltos Pro 960S (Pentium Pro 200) has been serving as our production
> DBMS server for 2+ years. Users are complaining its slow response when
> selecting big tables. So I am testing Mylex 150 on SCO and found that
> its performance is so low compared to other people's Linux software
> RAID's.
A) FWIW, you can check my previous results to this list re: my h/w and
s/w raid combination testing. Admittedly, it was 4x500 Xeon's, but
only using MMX, not KNI. *But*, the h/w raid was Mylex's best card
(eXtremeRAID 1100), and doing the xor's at the 4 main processors
made a huge difference in performance. I had flashed the card with
the latest of every piece of code off of mylex.com to make sure it
wasn't an old firmware issue.
> I don't much mind the low performance of Linux as its load is light, but
> I will be a dead man if the new hardware for SCO does not work much
> faster than the current production hardware.
>
> I am afraid that I have made a wrong decision using Mylex 150. Or,
> hopefully someone can teach me tuning Mylex 150 and thus much improve
> its performance, if I am lucky enough.
B) All my s/w+h/w raid combos have gotten better with lower s/w chunk size.
I'll be testing the Mylex chunk sizes dropping from their current
settings of 64k, and then checking write-thru vs. write-back although
write performance isn't your big problem for large table selects.
Of course, large table selects can be bound by other things like good
index's and even silly things like tnsnames.ora ordering. But, that's
quite off-topic for this list :) Time for my questions :)
Question 1: Since ext2fs+VFS don't really care much about the block device
they're working with, I'm afraid there's *zero* capabilities
for ensuring that a given file starts on a chunk size boundary
and therefore lots of fragmenting and reconstruction would be
occuring in the s/w raid (or is this not the case?). Could
vfs check for major=9 and fetch the chunk-size for alignment?
Alternatively, is there a method for telling offset in a block
device for the beginning of a file (or local file pointer) all
the way up at the vfs layer? I think null-padding out to a
chunk size boundary (parsed from raidtab) would be fine too :)
Question 2: S/W raid gives us some neat capabilities. One I'd love to see
implemented is a "dangerous-no-sync" option (pretty much the
mkraid option, but can turn it on and off on-the-fly).
What this allows is that I can write a huge set of files
(in my case 64 1GB files) with nothing but the data stripes
getting written out (and therefore, we can keep the data
out of the cache hierarchy hopefully and can stick to
simple DMA's out to the scsi controller). This should
get us close to (n-1)/n times the raid0 write performance.
Once all the heavy work is done and the drives would just
be sitting idle, the "dangerous-no-sync" option is turned
back off, and the s/w raid can do its normal "recovery"
and reconstruct (the necessary ideally, but all worst-case)
parity stripes.
James Manning
--
Miscellaneous Engineer --- IBM Netfinity Performance Development