Hi Vic, > > My guess is that the route to 10.14.3.0/24 via eth0 is still > at the top of > the table, sending your reply packets into a dead network. > This will stop > any traffic to 10.14.3.0 from reaching its destination -- in fact, I > notice that the default route is against eth0 also, so no > traffic will be > able to exit your guest (as you found with your SSH session). > See what > difference it makes when you configure the eth0 interface down (ifdown > eth0) -- this will remove the 10.14.3.0/24 via eth0 route. You still > won't have a default route, but traffic to the eth1-connected > network will > keep going. > That took care of the "inaccessible from guest2" problem!!! THANKS A HEAP!!!!
In some of my earlier attempts at automating this, I had included a step to drop the interface, but somewhere along the line I'd removed it. You can't imagine how many hours and different things I tried. Sometimes you just get too deep and miss the "little" things... :-( > > Doing this kind of recovery automatically would be tricky. You could > implement a scripted process based on Adam Thornton's VRT, or > use some of > the health-checking function from keepalived (for example), to > automatically configure an interface down and raise an alert > when an it no > longer flows traffic. You could also use a dynamic routing > protocol to > advertise your VIPA address to the network, but this may not > be desirable > to you. > What I'm "attempting" to use for the automation is the hotplug events that chandev spits out. When I get a "device_gone" event, I drop the interface and change the default route to the alternate interface. Then, when I get a "good" event, I bring the interface back up and leave the default route on the alternate. It seems that the OSA takes care of sending out the ARP updates, so I "should't" have to worry about it. Another issue I've run into is when "plug" the virtual cable back in. It can take a really long time for the device to come back and I've had a couple of instances where it didn't come back at all. Kept getting (many) messages like: Jun 9 12:35:25 guest1 kernel: qeth: setipm: return code 0xffffffff (IPA communication timeout) Jun 9 12:35:25 guest1 kernel: qeth: trying again... Eventually these stopped and the "normal" messages started: Jun 9 12:35:48 guest1 kernel: qeth: couldn't set adapter parameters on irq 0x7: xffffffff Jun 9 12:35:50 guest1 kernel: qeth: Could not start ARP processing assist on eth0: 0xffffffff These "usually" stop after a bit and I receive: Jun 9 12:30:26 guest1 kernel: qeth: recovered device 0x1000/0x1001/0x1002 (eth0) successfully. But, a couple of times now, it was not able to recover. So, I started playing around with: echo >/proc/chandev shutdown eth0 when I a "device_gone" event is received. Unfortunately, it never completes. Must be some kind of deadlock issue or something. Actually, it DID work ONCE and recovery worked a LOT better when I replugged the cable. Anyway, thanks again for knocking me upside the noggin! Leland
