Leland, I'm guessing that the SSH session that is freezing is originating somewhere outside the 10.14.3.0/24 network (see later).
I have found that although the QETH driver interprets a 'cable-pull' event when a NIC is uncoupled, the interface is often still marked 'UP' (this is definitely the case when a simulated NIC is uncoupled from a Guest LAN). When you have disconnected the NIC, check to see what status the interface is in and what the route table shows. My guess is that the route to 10.14.3.0/24 via eth0 is still at the top of the table, sending your reply packets into a dead network. This will stop any traffic to 10.14.3.0 from reaching its destination -- in fact, I notice that the default route is against eth0 also, so no traffic will be able to exit your guest (as you found with your SSH session). See what difference it makes when you configure the eth0 interface down (ifdown eth0) -- this will remove the 10.14.3.0/24 via eth0 route. You still won't have a default route, but traffic to the eth1-connected network will keep going. Also, for inbound traffic, if the ARP cache in the client machines is holding the MAC address of OSA1, the other OSA will not be used as an inbound path until the cache in that client is cleared. If this client happens to be your router... Doing this kind of recovery automatically would be tricky. You could implement a scripted process based on Adam Thornton's VRT, or use some of the health-checking function from keepalived (for example), to automatically configure an interface down and raise an alert when an it no longer flows traffic. You could also use a dynamic routing protocol to advertise your VIPA address to the network, but this may not be desirable to you. Hope this helps; get back to us with the results of your tests. Cheers, Vic On Mon, 9 Jun 2003, Lucius, Leland wrote: > I "think" I almost have it working, but I just can't get it all the way. I > have read and read and read as much info as I could find about it and, near > as I can tell, I've done everything correctly. The problem is that after > takeover, guests sharing the same OSA can no longer talk to each other until > the failed OSA is back up...
