Re: Large (?) AFS disk farms

john Thu, 12 Feb 1998 13:40:38 +0100 (MET)
Brian White <[EMAIL PROTECTED]> writes:
}We have recently been asked to increase our AFS filespace from about 75GB
}(distributed primarily between 2 SPARCserver 1000s and a SPARCcenter 2000, all
}running Solaris 2.5.1) to about 500GB. We will be allocated funds for at least 
}one more fileserver. Our servers are connected via a 100Mbit switch.

}How are large sites handling disk distribution amongst fileservers, disk
}connections to those servers (SCSI channels? Fiberchannel?), and backups for
}this amount of disk? What size individual disks do you use? We intend to speak
}with Transarc re: recommendations, but site descriptions and relevant info 
}would be appreciated.

     We serve about 1000 clients (userbase of 35,000).
     We are not quite at 500GB, but for our fileservers we are using:

       3 DEC AlphaServer 1000a's each with
         DEC's FDDI interface
         DEC's 3-channel FastWide SCSI Raid Controllers
         DEC's Hot-Swappable disk cabinets
         DEC's 7200 RPM FastWide 4.3GB disks (70 disks total)
         DEC's 20GB DLT tape drive on its own SCSI interface

     In my opinion a locally attached DLT is the only way
     to go for backups of this size.  Today we would
     probably use ATM or 100baseT, but in 1990 FDDI was king,
     so we continue with it...

     DEC's drives have been very reliable and can deliver
     12-15MB/s (locally, not through AFS) in the RAID-5
     config we are using.

     However,...

     We are suffering fairly frequent "slowdowns" which
     keep us from getting the performance I know these
     machines can deliver.

     For example, even when the students are away, I get
     only about 2.5MB/s (as seen by a very fast machine
     used as an AFS client) but thats only 1/5 or 1/6th
     of the 12-15MB/s I can get accessing the server's
     disks locally (e.g. "time dd if=/.../bigfile of=/...")

     More seriously, when our load goes up (e.g., as the
     semester goes on), we suffer long pauses (30 secs,
     1 min, ... N minutes) many times during the day
     (49 times so far today).

     I have been using Dan Hamel's (of SAS) excellent script
     'watch-afs' to keep track our servers and it shows
     3 problems which appear to be related -- for example,
     here is one taken at random from today's log:

11Feb98:10:53:06 afs-3 has 93 calls waiting for threads
11Feb98:10:53:07 afs-3 19 pct (1056/5367) spurious reads
11Feb98:10:53:08 afs-3 37 pct (1664/4472) resends

     We do not appear to be suffering from any resource
     (cpu, diskio, netio, memory) shortages, so...

     It appears to me that once the offered load exceeds
     some amount, things start falling apart.  I suspect
     this has something to do with the limit of 16
     "worker" threads in the fileserver.

     So, all that is a rather long way of saying that
     I don't think the amount of data on AFS servers
     is the critical thing, rather you need to look
     at what kind of load you will be generating.
     500GB could be 10 of 100 users logged in
     with 5GB each or 1000 of 50000 users with
     10MB each -- I suspect the resuts would be
     very different.

John
PS, for anyone else seeing perf problems, getting Dan's
    script and documenting them and filing a TroubleReport
    with Transarc would be a "good thing"  You can reference
    our Trouble Reports:  TR-40673 (Dan) & TR-40975 (me)
-- 
John Hascall, Software Engr.        Shut up, be happy.  The conveniences you
ISU Computation Center              demanded are now mandatory. -Jello Biafra
mailto:[EMAIL PROTECTED]
http://www.cc.iastate.edu/staff/systems/john/welcome.html  <-- the usual crud
Re: Large (?) AFS disk farms

Reply via email to