We have been using Transarc AFS to copy substantial numbers of
files, some of which are large (1 M or more), from a site in Utah to our
Sun in Texas. When we do so the operation starts but regularly fails, usually
with the diagnostic: 'cp: Connection timed out,' though sometimes we are
notified our ticket has expired.
After such failures, in our system logfile we find:
- Jul 22 19:52:48 azathoth vmunix: afs: Lost contact with file server 128.110.4.
156 in cell css.cs.utah.edu
- Jul 22 19:53:49 azathoth vmunix: afs: file server 128.110.4.156 in cell css.cs
.utah.edu is back up
When we break up the operation into several smaller copies, we have a greater
liklihood of success. In business hours the number/size of files that will
successfully transfer is smaller than at night. Even as the AFS transfer is
failing, FTP on the same machine to another remote site (France) may be
working, so the Internet is functional at least up to the last node common
to the routes from Richardson to Utah and France. Our line out is only
56Kbs.
This is a critical activity we are doing regularly in large volume (and is in
fact the reason we bought AFS), so we would appreciate some assistance
in getting to the root of the problem.
Thanks,
Ed Harbin/Convex