We have this problem on one of our servers.
It is a Sparc 10 running Solaris 2.4 and AFS 3.4 with a raid box
giving us a partition of about 16Gb's size.
It used to run quite nicely, until we had a crash
on another HD and decided to restore all lost volumes
to this partition. (the only place that at the moment could
hold all this data)
Somewhere in the middle of the restore, everything began
to crawl, and we had to stop restoring volumes to it.
It took 2 days until the server was working alright again,
and since then it jumps up to >96% iowait as soon as we
try to 'vos move' or restore more accounts to this partition.
To remedy this, we have moved the raid to another server,
running Solaris 2.51, but to no avail, it still behaves the same
way. /var/adm/messages doesn't say anything special, so it doesn't
seem to be an OS-kind-of-fault. (And we did change OS version)
There is memory enough on the computer, it doesn't swap at all
during these 'freezes'.
Experimenting a bit with SIGSTOP/SIGCONT on 'vosserver' during a
freeze shows us it is that process that causes the iowaits,
and as soon as we SIGCONT's it, it immediately begins again waiting
for something which I don't know.
Is there anyone else that have large partitions and/or have seen
this behaviour before?
It worked alright up to the point when the partition had used up
~12-13Gb of the ~16 available.