We had similar problem with RS6000 clients. It seemed to effect only
the RS6000's and only on large data transfers (e.g. big files or lots
of little ones). It turned out to be a router problem which went away
when they were upgraded.
- Woody Kellum
"Derrick J. Brashear" <[EMAIL PROTECTED]> writes:
> I posted this to alt.filesystems.afs, but I figured I'd post it here
> too, in case...
> Hi,
>
> I'm having a problem with connections being timed out; Namely, when I
> attempt to cd to /afs/club.cc.cmu.edu I get:
> afs: Lost contact with volume location server 128.2.232.226 in cell
> club.cc.cmu.edu
> /afs/club.cc.cmu.edu: Connection timed out.
>
> Of course, it's not really down:
> /usr/local/bin/fs checks -cell club.cc.cmu.edu
> afs: volume location server 128.2.232.226 in cell club.cc.cmu.edu is back
up
> All servers are running.
>
> >From a different machine, I can get there, no problem. The CellServDB
> entry for club.cc.cmu.edu is correct; The machine was rebooted with no
> change in the problem. I removed the cache and rebooted, still no change.
>
> I'm really not sure what I'm looking for. Can anyone give me pointers?
>
> Thanks
> -D
>
>
Derrick,
I've seen something similar in our cell. The afs servers have multiple
network interfaces, but only one "known" address as far as vlserver and/or
CellServDB are concerned. The client call to the server goes to the known
interface, but the response from the server comes via another of its
interfaces in
order to reduce the load on our routers. The problems occurred only on client
machines that also had multiple network adapters. When the client heard back
from
the server via its unadvertised interface, it sent its response out the wrong
adapter, e.g.
afs server known by its 129.35.xx.xx address, but it's default router is on
its
9.3.xx.xx interface.
client has 129.35 and 9.something interface to a lab ring. It talks to the
server via the 129.35 interface, but the response packet has a 9.3.xx.xx
source
address. When the client attempted to respond to this packet, tcpip routed
it out
the 9.interface and it never reached the server - hence Lost contact message.
Forcing the response out the 129.35 interface on the client with a route add
-net
statement in rc.tcpip cured the problem
David
----------------------------------------------------------------
David Littlewood Tie Line 793-8832
email: [EMAIL PROTECTED] Phone 512-823-8832