Re: Large (?) AFS disk farms

Hartmut Reuter Wed, 18 Feb 1998 13:24:15 +0100 (MET)
On Tue, 17 Feb 1998, Lyle Seaman wrote:

> Rainer Toebbicke wrote:
> > On Sun, 15 Feb 1998, Lyle Seaman wrote:
> > > replicated elsewhere. At the present time, I don't know of anyone who
> > > stores more than 60 GB of unreplicated read-write data on a single
> > > fileserver.  (Here's where everyone chimes in and tells me otherwise :-)
> > >
> > 
> > You knew it was going to happen: we do! Not 500, though.
> 
> Agh!  Serves me right. How much *unreplicated* data do you keep on a
> single server? 

We have 4.7 TB in our AFS-cell. However, this is multiple resident AFS,
therefore that doesn't mean we would have that much disk space.

For the ease of maintenance all our servers run MR-AFS, but only five of
them do really data migration to tape robot systems. The user home
directories and all the software reside on servers which do no data
migration.

The software volumes are stored on two servers, all RW-volumes on one of
them along with one set of RO-volumes. The other one contains only
RO-volumes. These are 53 GB on each server. Since these are all
replicated volumes we do no backup and no RAID in this case.

The user home directories reside on 4 servers, all of them with RAID 5
and hot spare disks. The partition sizes are between 40 and 54 GB. All
together this are 270 GB.

We are running all these servers with 32 threads without problems.

> 
> > 'attach' time *is* a nuisance 
> 
> We are continuing to work on this problem, though I can't say when a
> solution will be delivered.  Meanwhile, consider sorting your volumes so
> that the most frequently-used volumes are on "lower-lettered" partitions
> than the others.  Replicated volumes should be on the "highest-lettered"
> partitions.  Letter the partitions as needed to balance load across
> spindles or SCSI busses.  Pepper and salt to taste.

I cannot really see what advantage the load balancing across spindles and
SCSI buses should have since all I/O is synchronous. There will never be
more than one I/O request active at a time. The only improvement you can
get is that from disk striping which shortens the time needed for a
single I/O.

If you come into I/O bottle-necks you have to increase the number of
servers. This is more efficient and some times even cheaper than to
upgrade single servers to higher I/O bandwith.

> > Having said this: the original question was about a 500 GB fileserver. As
> > far as I understand restart time is governed by number of volumes to
> > attach - one could cut down on those by making each one 2 GB and bigger.
> > We have seen occasions on which somthing like that does not sound
> > ridiculous.
> 
> Unfortunately, you won't be able to backup, move, or replicate those
> volumes > 2GB.

With MR-AFS, of course, you can do that. Our biggest volumes have > 300 GB
 - not all of them on disk, of course.

> Restart time is governed both by number of volumes and number of
> inodes.  Volumes with lots of symlinks, for instance, will take longer
> to attach than their size would indicate.

Attach times could be much shorter, if the vnode bit maps would be stored
on disk at detach time and than during attach could just be read in 
instead of reading all vnodes to find out which ones are used and which not.

- Hartmut
-----------------------------------------------------------------
Hartmut Reuter                           e-mail [EMAIL PROTECTED]
                                           phone +49-89-3299-1328
RZG (Rechenzentrum Garching)               fax   +49-89-3299-1301 
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------
Re: Large (?) AFS disk farms

Reply via email to