The answers to your questions depend a lot on the type of
use you actually make of your file server. And for a lot
of that, you will want to make careful performance measurements
of your setup. For instance:
there is little advantage to multiple I/O paths simply for
pure "file server" use. The file server runs as a user
level process, so all reads and writes are "synchronous".
Unless the read is in the buffer pool (and it probably
isn't because it wasn't in your client's cache,) it
has to block for the read, and nothing else is going to
happen until that read returns. Writes could in theory
proceed asynchronously. However, the file server does
"fsync"s after writing each chunk, so it's still going to block
until the data reaches the disk. So, so far as the file
server goes, the only thing that counts is how fast a single
read or write is, not how many reads or writes it can have
pending.
So far as fast/slow scsi goes, if you want to maximize
performance and care about nothing else, then it would
be worth running benchmarks on your proposed configurations
and measuring performance from a Unix read/write call
through the local filesystem on the server. SCSI bus transfer
overhead is just one stage in a complex sequence of events and
other factors, such as interrupt latency, controller card command
processing, seek time, rotational latency, and software block
allocation algorithms, all of which have to be tuned just right
for peak performance. For any other use, I believe it
would make more sense to base fast/slow scsi decisions on other
factors, such as cost, availability, and convenience. It
may be far more valuable to have all of your servers using
the same kind of SCSI interface (so that you can swap drives &
cabling) rather than trying to fine-tune the hardware configuration
for each new model of file server.
If all your clients are concentrated on a relatively small
number of ethernets, fddi may not be a huge advantage.
(Unless you frequently shuffle volumes between file servers.)
Network performance can be just as important as
disk performance - and network cards can have their
weird internal controller delays & throughput problems
just like disk controllers and drives. Routers, too,
are almost never as fast as the networks they connect.
If your users tend to concentrate their work on a small number of files,
and mostly do reads, file server performance may be almost
completely irrelevant. Once it's in the cache, the file server is
pretty much out of the loop, and it's mostly a question of how fast
the client machine can shuffle data between the cache & programs.
Adding that "last little bit of cache" may make a larger
difference to what your users see than any amount of effort
on all the other pieces.
backups probably consume more of your file server than daily use. In
fact, for most cases, the greatest advantage to having a fast
file server is probably in terms of backups, not in terms
of actual file server response. Backups go through a separate
Unix process from the file server - so its I/O should in fact be
separate - and so there are some advantages to parallism here.
The "dump" format for volumes is basically unaligned byte streamed
data; and there are plenty of other inefficiences in the whole
backup process that all cost CPU. Probably the largest win would be
in making sure each machine has its own tape drive (to ensure the
least amount of network traffic) & it may be an advantage to put the
tape drive on its own SCSI channel (the exact performance hit would
be worth measuring). Scheduling can also make a difference;
if you can do your tape backups at night, you may not care
at all how much of the server it eats.
database machine configuration is a key performance factor. The latest AFS
releases are a lot more memory hungry than older releases; and even if
your cell has not grown much, chances are your AFS backup database
is a lot larger than it used to be. Slow database machines
will slow down even the speediest clients & file servers
(and *will* even slow down backups) so it's worth making sure your
database servers aren't overloaded, but are fast, endowed with more
than enough memory to avoid paging, don't do anything but DB service,
and & have plenty of fast disk for that.
Client machine configuration makes a difference as well, as well as network
paths and client subnet congestion. For best performance, you have to
look at the whole picture and identify the bottlenecks, instead of just
concentrating on one piece.
To get the numbers on performance bottlenecks:
the server & cache manager both come with extensive
instrumentation; with a bit of effort you can learn
far more about what each is doing than you ever
wanted to know. If you don't have source the numbers may not
make as much sense - presumably Transarc can help there.
A good network sniffer can help in terms of figuring
out where the packets are going and what's being slow.
Even if you can't decipher the rx packets, knowing
which machines & ports thing are flying inbetween can
still help in terms of identifying which places to look at.
Standard Unix tools, such as vmstat, iostat, netstat,
and even ps should not be underestimated. "time" and "dd"
can be used as a very quick & dirty "disk performance"
benchmark, one you can even do in the offices of your not
quite favorite local sales sleaze when his back is turned.
You will probably get the best overall performance by having a relatively
large number of relatively small fairly fast file servers each with its own
tape drive & with sufficient overall network capacity. Every Unix
workstation company makes a number of really nifty low cost workstations
that perform almost as good as their big fat servers at a fraction of the
cost; for AFS use, these workstations make an attractive bargain, and
you'll probably even save enough buying these to do something crazy like
buying "hot spare"s. So most purposes, you will probably save yourself
the most grief by standardizing on a relatively small number of configurations
and emphasizing interchangibility and flexibility first. For a more definitive
answer than that, you really need to study the bottlenecks, needs, and
constraints of your particular site.
I hope this helps, even it wasn't cookbook
"2X + 3Y = # of servers to buy".
-Marcus Watts
UM ITD RS Umich Systems Group