There are a number of factors that are more important than any you
mentioned, such as:
        user population and habits
                amount of data to be stored
                amount of data that will change
        backup needs - how often, etc.
        client machines - how many, what kind.
        network configuration -what's between the clients & servers.
        budget constraints - how much can you spend?

So far as performance goes, the network is likely to be your
real bottleneck.  Ie, ethernet is so much slower than SCSI
that the difference between fast SCSI-II and Fast&Wide
is likely to be impossible to measure.  It may be
advantageous to locate the server on FDDI even if your
clients are on ethernet, not because it will be any faster
but to improve network throughput.  If the network is at
all flakey, this will show up first in AFS - a 1-2%
packet loss rate on the client's local subnet will hurt AFS
performance noticeably (even if the fileserver is fast, reliable,
and its subnet is fine), and periodic network outtages will
mean periodic filesystem outtages.  Some network cards are
better than others - if there is a choice it may be worth
spending a bit of time measuring network delay & throughput.

In the typical computing environment, most files are small - disk
access time is going to be more important than disk transfer rates.
Multiple spindles *may* be an advantage in terms of having
more arms to seek.  Multiple disk controllers on a single server probably
won't buy much in terms of disk performance because there is only one
fileserver process per machine and it makes synchronous kernel calls to
fetch data from the disk.  It will probably make more sense to buy multiple
small "desktop" style machines and put only a few drives per machine
of trying to buy a single large "server" machine.  Each machine may not
be so fast, but the combined throughput should be much better, and there
are additional benefits in terms of redundancy, scalability, and
reliability.

Serving files doesn't usually consume much CPU - but backups do.
But even when serving files, a fast CPU does decrease response time.

Timewise, it is best to have a tape drive on each file server.
Backups over the network are *much* slower.  It would be advantageous
to avoid doing backups during periods of peak demand, because
of the amount of CPU backups can eat.  If backups are done
on the local machine, it may be advantageous to locate the
tape drive on a different controller - the tape drive was
undoubted designed to maximize tape throughput while minimizing
cost - and simultaneous operation "online" at the same time
may not have been seriously contemplated.

There are obvious cost tradeoffs here concerning the number
of file server machines, tape drives, & DB servers.

DB servers should also be figured into this planning, because
any slowness there will be reflected in terms of overall slowness
of the whole cell.  It would be advantageous to avoid putting any
serious file storage on the DB servers.  If backups don't need
to be done very often, perhaps doing them over the network would
be an advantage, in terms of reducing CPU loading.

Transarc does provide a variety of tools to measure performance.
But old fashioned tools such as "iostat", "ps", and "ping" also
can give useful data.  I think there have been papers at some
of the AFSUG meetings that described some of the tools and
overall strategy.

                                -Marcus Watts
                                UM ITD RS Umich Systems Group

Reply via email to