Our company is currently considering the purchase of an EMC Symmetrix box. In
simplest terms, if you haven't seen one, this box can house a sizeable disk
farm, to which you can attach mutiple hosts. It also has a lot of other
niceties that are probably only tangental to the question at hand, so I'll stop
there.
In trying to determine if it would be beneficial to hook up our AFS servers to
this box and how such a configuration might look, I began to wonder what
exactly a "reasonable" AFS server configuration would look like otherwise,
i.e., without using the Symmetrix box. What is an optimal layout of RO & RW
volumes across AFS file servers? How does one maximize availability and
reliability while minimizing the impact of unplanned outages?
I thought I'd turn to the list members as "the voice of experience" and see if
I could get a sense of what exists in the way of "best practices", if anything.
And also to find out if any other sites have employed a Symmetrix box for
their AFS server storage.
Let me start by laying out our cell's current configuration and give you my
current thoughts on how I'd lay things out. We have license for 7 AFS network
servers. Three of them are database-only servers running under Linux and
scattered about the campus. Of the other 4, which are dedicated AFS file
servers, 2 are SGI boxes with all their vice partitions on regular hard drives,
and the other 2 are Sun boxes with some vice partitions on RAID arrays.
My scheme was to put all non-replicated (RW) volumes on the Sun RAID partitions
(home directories & project spaces. etc.), put the RW versions of replicated
volumes on the other Sun vice partitions, and dedicate the two SGI boxes to
each hold identical copies of the RO versions of replicated volumes. I figured
this would allow us to take either of the 2 SGI boxes out of commission (for
maintenance, etc.) since the other SGI box would still be able to serve the
same set of RO volumes.
On both Sun servers I would keep enough extra RAID-protected partition space
free so that if I wanted to plan downtime on one of the servers I would only
need to temporarily move its (non-replicated) volumes to a RAID-protected vice
partition on the other Sun server. My only worry would then be unplanned
outages on the Sun servers, which could take out some portion of the
non-replicated volumes.
[I figured each server's RW vice partitions would need configured with G/(n-1)
Gigs, where "G" = total Gigs of usable RW storage required, and "n" = number of
RW servers. As an example, with G = 100 GB and n = 2, each server would need a
100 GB RAID partition, with 50 GB of RW volume data on each and 50 GB of free
space always available on each, so that I can move one server's 50 GB of data
to the other server in order to take the first server down without loss of
availability. That represents 200 GB of storage to serve 100 GB of data, or
100% of disk space "slush". As you add servers to the pool, you reduce "slush"
and spread out your RW vulnerability.]
My first cut at a Symmetrix configuration was to simply mirror our current
configuration: 2 servers serving RO volumes and 2 servers serving RW volumes,
each of the latter with enough extra capacity to hold all the RW volumes of the
other server.
My next thought was that I could configure -all- 4 servers identically, with
each one serving a RO partition and a RAID-protected RW partition. The upside
of this would be that I could spread out the vulnerable RW volumes over a
larger number of servers to minimize the impact of unplanned outages, and each
server would also need less disk space "slush" to handle the planned outages.
The downside then becomes ``do I really need 4 RO copies of the "man page"
volume?'' So, maybe not -all- of the servers have to serve RO volumes, but
what would be prudent?
Also, the graph of the curve G/(n-1) drops quickly with the first 3 to 5 or so
servers (n={3,4,5}). After that it begins to flatten out pretty quickly,
meaning each new server added provides less and less of a benefit in terms of
how much less "slush" you need on each one. (When you go from 2 to 3 servers
your "slush" goes from 50% down to 33.33%, big drop. But when you go from 6 to
7 servers your "slush" only goes from 20% to 16.67%, small drop.)
In general, though, does this seem like a sound scheme for organizing one's RO
& RW volumes across your servers? How do other sites do it? Are there any
other sites who are using a Symmetrix box for their AFS server storage? Are
there other relevant factors of which I'm being ignorant?
Those are my real questions. I'd be happy for any insight the list members
might have.
Thanks,
--
Norman Joseph, UNIX Systems Engineer [EMAIL PROTECTED] IC|XC
Concurrent Technologies Corporation 814/269.2633 --+--
NI|KA
ftp://ftp.ctc.com/pub/PGP-keys/joseph.asc
*** Do a good deed today. Visit http://www.thehungersite.com ***