Hi Vic,

>
> My guess is that the route to 10.14.3.0/24 via eth0 is still
> at the top of
> the table, sending your reply packets into a dead network.
> This will stop
> any traffic to 10.14.3.0 from reaching its destination -- in fact, I
> notice that the default route is against eth0 also, so no
> traffic will be
> able to exit your guest (as you found with your SSH session).
>  See what
> difference it makes when you configure the eth0 interface down (ifdown
> eth0) -- this will remove the 10.14.3.0/24 via eth0 route.  You still
> won't have a default route, but traffic to the eth1-connected
> network will
> keep going.
>
That took care of the "inaccessible from guest2" problem!!!  THANKS A
HEAP!!!!

In some of my earlier attempts at automating this, I had included a step to
drop the interface, but somewhere along the line I'd removed it.  You can't
imagine how many hours and different things I tried.  Sometimes you just get
too deep and miss the "little" things...  :-(

>
> Doing this kind of recovery automatically would be tricky.  You could
> implement a scripted process based on Adam Thornton's VRT, or
> use some of
> the health-checking function from keepalived (for example), to
> automatically configure an interface down and raise an alert
> when an it no
> longer flows traffic.  You could also use a dynamic routing
> protocol to
> advertise your VIPA address to the network, but this may not
> be desirable
> to you.
>
What I'm "attempting" to use for the automation is the hotplug events that
chandev spits out.  When I get a "device_gone" event, I drop the interface
and change the default route to the alternate interface.  Then, when I get a
"good" event, I bring the interface back up and leave the default route on
the alternate.

It seems that the OSA takes care of sending out the ARP updates, so I
"should't" have to worry about it.

Another issue I've run into is when "plug" the virtual cable back in.  It
can take a really long time for the device to come back and I've had a
couple of instances where it didn't come back at all.  Kept getting (many)
messages like:

Jun  9 12:35:25 guest1 kernel:  qeth: setipm: return code 0xffffffff (IPA
communication timeout)
Jun  9 12:35:25 guest1 kernel:  qeth: trying again...

Eventually these stopped and the "normal" messages started:

Jun  9 12:35:48 guest1 kernel:  qeth: couldn't set adapter parameters on irq
0x7: xffffffff
Jun  9 12:35:50 guest1 kernel:  qeth: Could not start ARP processing assist
on eth0: 0xffffffff

These "usually" stop after a bit and I receive:

Jun  9 12:30:26 guest1 kernel: qeth: recovered device 0x1000/0x1001/0x1002
(eth0) successfully.

But, a couple of times now, it was not able to recover.  So, I started
playing around with:

echo >/proc/chandev shutdown eth0

when I a "device_gone" event is received.  Unfortunately, it never
completes.  Must be some kind of deadlock issue or something.  Actually, it
DID work ONCE and recovery worked a LOT better when I replugged the cable.

Anyway, thanks again for knocking me upside the noggin!

Leland

Reply via email to