Brian White <[EMAIL PROTECTED]> writes:
}We have recently been asked to increase our AFS filespace from about 75GB
}(distributed primarily between 2 SPARCserver 1000s and a SPARCcenter 2000, all
}running Solaris 2.5.1) to about 500GB. We will be allocated funds for at least
}one more fileserver. Our servers are connected via a 100Mbit switch.
}How are large sites handling disk distribution amongst fileservers, disk
}connections to those servers (SCSI channels? Fiberchannel?), and backups for
}this amount of disk? What size individual disks do you use? We intend to speak
}with Transarc re: recommendations, but site descriptions and relevant info
}would be appreciated.
We serve about 1000 clients (userbase of 35,000).
We are not quite at 500GB, but for our fileservers we are using:
3 DEC AlphaServer 1000a's each with
DEC's FDDI interface
DEC's 3-channel FastWide SCSI Raid Controllers
DEC's Hot-Swappable disk cabinets
DEC's 7200 RPM FastWide 4.3GB disks (70 disks total)
DEC's 20GB DLT tape drive on its own SCSI interface
In my opinion a locally attached DLT is the only way
to go for backups of this size. Today we would
probably use ATM or 100baseT, but in 1990 FDDI was king,
so we continue with it...
DEC's drives have been very reliable and can deliver
12-15MB/s (locally, not through AFS) in the RAID-5
config we are using.
However,...
We are suffering fairly frequent "slowdowns" which
keep us from getting the performance I know these
machines can deliver.
For example, even when the students are away, I get
only about 2.5MB/s (as seen by a very fast machine
used as an AFS client) but thats only 1/5 or 1/6th
of the 12-15MB/s I can get accessing the server's
disks locally (e.g. "time dd if=/.../bigfile of=/...")
More seriously, when our load goes up (e.g., as the
semester goes on), we suffer long pauses (30 secs,
1 min, ... N minutes) many times during the day
(49 times so far today).
I have been using Dan Hamel's (of SAS) excellent script
'watch-afs' to keep track our servers and it shows
3 problems which appear to be related -- for example,
here is one taken at random from today's log:
11Feb98:10:53:06 afs-3 has 93 calls waiting for threads
11Feb98:10:53:07 afs-3 19 pct (1056/5367) spurious reads
11Feb98:10:53:08 afs-3 37 pct (1664/4472) resends
We do not appear to be suffering from any resource
(cpu, diskio, netio, memory) shortages, so...
It appears to me that once the offered load exceeds
some amount, things start falling apart. I suspect
this has something to do with the limit of 16
"worker" threads in the fileserver.
So, all that is a rather long way of saying that
I don't think the amount of data on AFS servers
is the critical thing, rather you need to look
at what kind of load you will be generating.
500GB could be 10 of 100 users logged in
with 5GB each or 1000 of 50000 users with
10MB each -- I suspect the resuts would be
very different.
John
PS, for anyone else seeing perf problems, getting Dan's
script and documenting them and filing a TroubleReport
with Transarc would be a "good thing" You can reference
our Trouble Reports: TR-40673 (Dan) & TR-40975 (me)
--
John Hascall, Software Engr. Shut up, be happy. The conveniences you
ISU Computation Center demanded are now mandatory. -Jello Biafra
mailto:[EMAIL PROTECTED]
http://www.cc.iastate.edu/staff/systems/john/welcome.html <-- the usual crud