On Tue, 22 Oct 1996, Jan Johansson wrote:
>
> Somewhere in the middle of the restore, everything began
> to crawl, and we had to stop restoring volumes to it.
> It took 2 days until the server was working alright again,
> and since then it jumps up to >96% iowait as soon as we
> try to 'vos move' or restore more accounts to this partition.
>
...
>
> Experimenting a bit with SIGSTOP/SIGCONT on 'vosserver' during a
> freeze shows us it is that process that causes the iowaits,
> and as soon as we SIGCONT's it, it immediately begins again waiting
> for something which I don't know.
>
> Is there anyone else that have large partitions and/or have seen
> this behaviour before?
>
> It worked alright up to the point when the partition had used up
> ~12-13Gb of the ~16 available.
>
>
We have seen a similar problem on a server using 6GB SDS raids: during the
'clone' of a biggish volume with thousands of files the volserver would
literally stop the machine, so that even clients connecting to the fileserver
time out.
We suspected a problem with too many I/O requests queuing up - they are
typically slower on a RAID than on a normal disk.
We never found the real cause. In the end what helped was that Transarc gave
us a volserver supporting a '-sleep' parameter: we set it up so that the
volserver sleeps 2 seconds every 30 seconds of sustained activity. This way
the RAID gets a chance to 'cool down' and the fileserver to say hello to all
its clients.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke http://wwwcn1.cern.ch/~rtb -or- [EMAIL PROTECTED] O__
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland > |
Phone: +41 22 767 8985 Fax: +41 22 767 7155 ( )\( )