RAID deployment questions

A . Haxby Fri, 27 Dec 1996 20:59:20 GMT
> Does anybody have information or experience on whether it is better to
> try to use the entire RAID for one partition or to split it up into
> smaller partitions?  And if it is partitioned, how large or small
> should the partitions be? (The RAID disks are 50GB and 67GB.)

I've spent Christmas playing with my new toy - a Sparc Storage Array
connected to a Sparc 20 running Solaris 2.5.1, DiskSuite 4.1 (you can't use
the Veritas s/w because it doesn't work with the AFS fsck, according to
replies I got from a previous post regarding RAID) and afs3.4a. I'm getting
the distinct impression that a bit of experimentation is a must. At the
moment I'm using RAID 0+1 (striped & mirrored), and I intend to set up
another server with RAID 5 soon. Depending on how you stripe or mirror
across disks does seem to make a big performance difference, but my
findings are completely subjective at the moment because I'm still trying
to populate the new partitions with test data before I can do any real
objective measurements. I haven't had the chance to start playing with
read/write options or interleave factors yet, but I will do.
 
I've created 4 partitions, 2 X 9.7 GB, 1 X 7.8 GB and 1 x 1.1 GB, each with
the striping/mirroring set up a different way. On each partition I've
created various volumes up to 2GB (the maximum vol size, you can create
volumes bigger than that, but you won't be able to move or dump them). In
each volume I'm creating a number of large files, and in doing that I've
found an irritating problem - if on a client you do something like

tar -cf test.vol.1/foo.1 /usr &
tar -cf test.vol.1/foo.2 /root &
tar -cf test.vol.1/foo.3 /scratch &
cp test.vol.2/big_file_1 test.vol.3 &

Very soon you get error messages on the console saying

*** Cache partition is full - decrease cachesize!!! ***

even if the cache is physically 100MB and cacheinfo sets the virtual
cachesize to 40 MB. Eventually the cp's or tar's fail. This may be a
problem for the users who want the new server for large siesmic data sets.
At one point I was creating large files in this way on the server itself,
which also is an afs client. Shortly after getting the cache messages all
of my active processes doing cp's and tar's hung and could not be killed. I
tried a reboot but it hung after a "warm shutting down rx" or something
message. Stop-A followed by a sync and a reboot brought the system back OK,
but I find this worrying.


> I guess some of the concerns are better managability of volumes and how
> quick a salvage would be on one vs. many partitions.  I figure people are
> going to have different setups and opinions, but I am interested in
> anyone's experience on this.

At the moment I'm just trying to make sure everything works OK, and find
the optimum RAID configuration. After that, our users want the largest
volumes possible, so that's 2GB. To avoid wasting disk when volumes are
only partially full, you need to oversubscribe partitions, so I figure the
partition needs to at least 10 times the size of the volumes on it, maybe
more if you want to oversubscribe it without the danger that the partition
will fill as soon as someone tries to fill a partially used volume.
Therefore I guess I'll need 20 - 40 GB partitions. I'm not too worried
about how long a salvage takes - hopefully we won't need to do it very
often. I'm more concerned about the performance and reliability of the
underlying RAID stuff.

I havn't got much time to do anymore with the server now, other than leave
it running to see if it stays up, because I'm off to USENIX and will be out
of the office for a few weeks. I am however interested in other peoples
experience with RAID, particularly ODS and SSA.

Thanks,

--
Andy Haxby                      [EMAIL PROTECTED]
Systems Gynaecologist           +31 70 3112187

Shell International Exploration and Production
Research Technical Services, Rijswijk, Netherlands.
RAID deployment questions

Reply via email to