We've been noticing some AFS performance problems recently (they could have been there all along -- no one until recently has had the time to stop and look at things like this). Granted, there are many obvious factors to consider on a topic like this. So I'll start out with a description of our setup: All file servers and clients are running AFS 3.3 (not a). We have 2 SPARC 2 machines running SunOS 4.1.3 with 12 Gigs each of disk serving user volumes and application volumes. We have 1 SPARC 2 machine running SunOS 4.1.3 with 12 Gigs of disk serving project volumes. We have 1 SPARC 2 machine running SunOS 4.1.3 with 24 Gigs of disk serving data volumes. Mostly static data sets which are archived under AFS. Very little change to these data volumes. We have 1 SPARC 2 machine running SunOS 4.1.3 designated as our sole AFS database server (VLDB, KAS, PTS). All of these machines are on our FDDI ring. Most clients have a 50-70 meg cache. Now the problem, flat out, is that we're experiencing times of around 40 seconds to transfer a non-cached 5 meg file from AFS-space to another machine directly on the FDDI ring (no router is passed through, and the client machine in question is 10 feet away -- a compute node of 8 in a cluster) Our FDDI ring is surely not saturated, and simple 5 meg file tests have been done at various times throughout the day under normal workload, all of which have shown times from 20 seconds to 40 seconds to grab a 5 meg file from AFS and store it on a local disk. We have also ruled out an IO contention on the client's SCSI bus. Anyway, what I'm looking for is just some general guidance with this matter and possibly a plan of attack for singling the problem area out. Any suggestions and case studies or personal test results would be greatly appreciated. ------- Jeff Blaine CIESIN Operations
