> People are talking about performance at about 20MB/s using UDMA, but
> mine SCSI's are so slow!

I'll take some (educated?) guesses.

> * box one: Linux 2.0.35 (have been running for about 10 months):
> 128MB RAM, Pentium 233

1) seems like this would be a very old kernel patch... back from the days
   when Ingo admitted the raid5 overhead was poor. *shrug*

> Adaptec 2940WU
> software RAID5=3 x ST3455N (7200rpm, Ultra-SCSI)+ 1 IBM DCAS-34330W
> (5400rpm, wide), 16k chunk size

2) I've seen my own performance get better at 4k chunk size

3) any raid (in my experience) will get throttled by the slowest individual
   drive (makes sense to me, as we can't have the 5400 rpm drive queueing
   up thousands of commands and we're writing 10 MB later on the others.
   Perhaps try a) taking out the 5400 and (optionally) replacing with a
   7200 (then rebuilding).  My performance jumped a good bit between
   7200 and 10krpm drives, so I'd imagine 5400 and 7200 is big as well.

> 2940WU also connects 1 HP C1533A and 1 EXABYTE EXB-8505 tape drives, in
> addition to the 4 Disks.

4) Could be that the other devices are forcing the scsi bus down to
   a slower speed.  IIRC the 2940's device driver should print out
   what speeds it gets to each device (recent versions say MB/sec
   where previous would say 20 MHz, 16-bit instead of just 40MB/sec).
   What's the driver say?

> bonnie says:

5) not that it matters, but try removing the first 9 chars of each
   line in a bonnie run to help prevent wrapping.  I'll do it here.

>      -------Sequential Output-------- ---Sequential Input-- --Random--
>      -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
>   MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
>  100  5610 46.4  6703 13.4  2594 11.7  7012 47.1  8485 14.4 150.9 4.0

6) although I'd love to hear someone give a better explanation, I've always
   been told to run bonnie with a size at least twice main memory, so in
   your case "bonnie -s 256" (of course, to help counteract distance from
   the drives, you typically want a lot more CPU power than the processing
   on an available h/w raid device).

> sync;date;if=/dev/md0 of=/dev/zero bs=1024k count=1024;date
> gives 7.5MB/s
> sync;date;if=/dev/zero  of=/tmp/1gb bs=1024k count=1024;date
> gives 4.9MB/s

7) As was pointed out to me, a last sync before date is really needed
   to help ensure accuracy, but these correlate with what's expected
   given your bonnie results.

> * box two: SCO OpenServer (can't replace it with Linux because we have
> Informix 4GL running):
> 96MB RAM, Penium 200MMX
> AcceleRAID 150
> hardware RAID5=4 x Quantum Atlas 10K (10000rpm)

8) Due to the possible use of MMX instructions in s/w raid, a processor
   swap might be in order :)  Of course, I'm not sure if that's the case
   for the raid version you're running on the 2.0.35 kernel.

9) *of course* 10krpm drives are going to whip 3 7200's and a 5400 :) 
   Just about any raid is going to be bound in the drives.  I'm actually
   a little surprised it only pulled 5.8 MB/sec on the dd write.

> sync;date;if=/dev/p2d4 of=/dev/null bs=1024k count=1024;date
> gives 15MB/s
> sync;date;if=/dev/zero of=/tmp/1gb bs=1024k count=1024;date
> gives 5.8MB/s
> 
> 2 x 256MB ECC& registered SDRAM and 1 Pentium III 450 will soon replace
> this testing machine for SCO as a production DBMS server.
> 
> AcreAltos Pro 960S (Pentium Pro 200) has been serving as our production
> DBMS server for 2+ years. Users are complaining its slow response when
> selecting big tables. So I am testing Mylex 150 on SCO and found that
> its performance is so low compared to other people's Linux software
> RAID's.

A) FWIW, you can check my previous results to this list re: my h/w and
   s/w raid combination testing.  Admittedly, it was 4x500 Xeon's, but
   only using MMX, not KNI.  *But*, the h/w raid was Mylex's best card
   (eXtremeRAID 1100), and doing the xor's at the 4 main processors
   made a huge difference in performance.  I had flashed the card with
   the latest of every piece of code off of mylex.com to make sure it
   wasn't an old firmware issue.

> I don't much mind the low performance of Linux as its load is light, but
> I will be a dead man if the new hardware for SCO does not work much
> faster than the current production hardware.
> 
> I am afraid that I have made a wrong decision using Mylex 150. Or,
> hopefully someone can teach me tuning Mylex 150 and thus much improve
> its performance, if I am lucky enough.

B) All my s/w+h/w raid combos have gotten better with lower s/w chunk size.
   I'll be testing the Mylex chunk sizes dropping from their current
   settings of 64k, and then checking write-thru vs. write-back although
   write performance isn't your big problem for large table selects.

Of course, large table selects can be bound by other things like good
index's and even silly things like tnsnames.ora ordering.  But, that's
quite off-topic for this list :) Time for my questions :)

Question 1: Since ext2fs+VFS don't really care much about the block device
            they're working with, I'm afraid there's *zero* capabilities
            for ensuring that a given file starts on a chunk size boundary
            and therefore lots of fragmenting and reconstruction would be
            occuring in the s/w raid (or is this not the case?).  Could
            vfs check for major=9 and fetch the chunk-size for alignment?

            Alternatively, is there a method for telling offset in a block
            device for the beginning of a file (or local file pointer) all
            the way up at the vfs layer?  I think null-padding out to a
            chunk size boundary (parsed from raidtab) would be fine too :)

Question 2: S/W raid gives us some neat capabilities.  One I'd love to see
            implemented is a "dangerous-no-sync" option (pretty much the
            mkraid option, but can turn it on and off on-the-fly).

            What this allows is that I can write a huge set of files
            (in my case 64 1GB files) with nothing but the data stripes
            getting written out (and therefore, we can keep the data
            out of the cache hierarchy hopefully and can stick to
            simple DMA's out to the scsi controller).  This should
            get us close to (n-1)/n times the raid0 write performance.
            Once all the heavy work is done and the drives would just
            be sitting idle, the "dangerous-no-sync" option is turned
            back off, and the s/w raid can do its normal "recovery"
            and reconstruct (the necessary ideally, but all worst-case)
            parity stripes.

James Manning
-- 
Miscellaneous Engineer --- IBM Netfinity Performance Development

Reply via email to