We've been noticing some AFS performance problems recently (they could have
been there all along -- no one until recently has had the time to stop and
look at things like this).  Granted, there are many obvious factors to
consider on a topic like this.  So I'll start out with a description of our
setup:

   All file servers and clients are running AFS 3.3 (not a).

   We have 2 SPARC 2 machines running SunOS 4.1.3 with 12 Gigs each of disk
   serving user volumes and application volumes.

   We have 1 SPARC 2 machine running SunOS 4.1.3 with 12 Gigs of disk serving
   project volumes.

   We have 1 SPARC 2 machine running SunOS 4.1.3 with 24 Gigs of disk serving
   data volumes.  Mostly static data sets which are archived under AFS.
   Very little change to these data volumes.

   We have 1 SPARC 2 machine running SunOS 4.1.3 designated as our sole
   AFS database server (VLDB, KAS, PTS).

   All of these machines are on our FDDI ring.

   Most clients have a 50-70 meg cache.

Now the problem, flat out, is that we're experiencing times of around
40 seconds to transfer a non-cached 5 meg file from AFS-space to another
machine directly on the FDDI ring (no router is passed through, and the
client machine in question is 10 feet away -- a compute node of 8 in
a cluster)

Our FDDI ring is surely not saturated, and simple 5 meg file tests
have been done at various times throughout the day under normal
workload, all of which have shown times from 20 seconds to 40 seconds
to grab a 5 meg file from AFS and store it on a local disk.  We have
also ruled out an IO contention on the client's SCSI bus.

Anyway, what I'm looking for is just some general guidance with this
matter and possibly a plan of attack for singling the problem area
out.  Any suggestions and case studies or personal test results would
be greatly appreciated.

-------
Jeff Blaine
CIESIN Operations

Reply via email to