Re: Large (?) AFS disk farms

Rainer Toebbicke Thu, 12 Feb 1998 14:09:12 +0100 (MET)
On Wed, 11 Feb 1998 [EMAIL PROTECTED] wrote:

>      More seriously, when our load goes up (e.g., as the
>      semester goes on), we suffer long pauses (30 secs,
>      1 min, ... N minutes) many times during the day
>      (49 times so far today).
>

We used to run into similar problems and finally decided that it was 
partly due to UDP overruns between the FDDI-connected servers and simply 
a large number of clients. This was confirmed by astronomically high 
udpInOverflows as reported by 'netstat -s' (Solaris).  When we increased 
the UDP buffer size by a significant factor (to 1MB from 64k) most of the 
problems went away.

We recently started trying to get a better idea on how the 'fileserver'
process interacts with the disk subsystem. To this end we modified the 
code as to run the main disk I/Os (the ones used to read and write data, 
not accesses for vnode lookups, ACLs and the like) in (POSIX) threads
while putting the individual RX-lwp in wait until the I/O has finished 
but let other RX-calls go on.

All the individual stages are timed and the maxima printed in the 
FileLog. And there are suprises:

1. we are running on a Sun Ultra 2 and an Ultra-SCSI attached RSM2000 RAID 5
system with 128 MB cache. However, the maximum (not the average!) real
service time for an individual I/O was almost 2.5 seconds (!!). That's an
eternity for such a machine! I wonder what was going on in the system? 

2. but it can be worse, although the question is now in how far our mixture
of POSIX and RX-lwp threads is to blame: when the POSIX thread finishes it
resynchronizes with usual RX lwp package via a signal. Surprisingly in one
case this took slightly over 4 seconds (!), during which the fileserver must
have been busy doing other things. 

The maximum aggregate elapsed time for a single I/O request was thus 4.2 
seconds, a time during which an FDDI ring can easily fill any UDP buffer 
you reasonably allocate, therefore you loose packets which will most 
likely be resynchronized only after various timeouts.


Similar surprises also appear on standard, unmodified configurations: 

'afsmonitor' shows two fields 'FetchData max' and 'StoreData max': on one of
our server these read 415 and 306 seconds !! If they mean what their name
leads me to think they mean I conclude that in one case one guy patiently
turned his thumbs for 5 minutes whereas in the other he probably went for a
coffee cursing... unless both were multi-megabyte-transfers. The highest
'Storedata max' was 1121 seconds (!!). Plenty of time to pick up the phone
and start yelling - to a point where I would be happy if somebody could
speak up and say "look, it's not that bad because you misunderstood that and
that". 



In all of that: these are the peaks which bother me. They do not reflect the
overall (average) performance of even the >170 GB servers which is absolutely
acceptable if not brilliant compared to other things I have seen.


However, if somebody speaks about setting up a single 500 GB fileserver: 
think about how long it takes you to reload that much stuff from backup tapes
should it ever get damaged. I believe that you can realistically pipe 2-2.5
Gigabytes per *hour* onto a local tape drive. 200 hours working full time
makes.... 

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke    http://wwwcn1.cern.ch/~rtb -or- [EMAIL PROTECTED]  O__
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland   > |
Phone: +41 22 767 8985       Fax: +41 22 767 7155                     ( )\( )
Re: Large (?) AFS disk farms

Reply via email to