On Tue, 22 Oct 1996, Jan Johansson wrote:

> 
> Somewhere in the middle of the restore, everything began
> to crawl, and we had to stop restoring volumes to it.
> It took 2 days until the server was working alright again,
> and since then it jumps up to >96% iowait as soon as we
> try to 'vos move' or restore more accounts to this partition.
> 
...
> 
> Experimenting a bit with SIGSTOP/SIGCONT on 'vosserver' during a
> freeze shows us it is that process that causes the iowaits,
> and as soon as we SIGCONT's it, it immediately begins again waiting
> for something which I don't know. 
> 
> Is there anyone else that have large partitions and/or have seen
> this behaviour before?
> 
> It worked alright up to the point when the partition had used up
> ~12-13Gb of the ~16 available.
> 
> 

We have seen a similar problem on a server using 6GB SDS raids: during the
'clone' of a biggish volume with thousands of files the volserver would
literally stop the machine, so that even clients connecting to the fileserver
time out. 

We suspected a problem with too many I/O requests queuing up - they are 
typically slower on a RAID than on a normal disk. 

We never found the real cause. In the end what helped was that Transarc gave
us a volserver supporting a '-sleep' parameter: we set it up so that the
volserver sleeps 2 seconds every 30 seconds of sustained activity. This way
the RAID gets a chance to 'cool down' and the fileserver to say hello to all
its clients. 

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke    http://wwwcn1.cern.ch/~rtb -or- [EMAIL PROTECTED]  O__
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland   > |
Phone: +41 22 767 8985       Fax: +41 22 767 7155                     ( )\( )

Reply via email to