> -----Original Message-----
> From: Jon Hart [mailto:[EMAIL PROTECTED] 
> Sent: Friday, April 07, 2006 1:25 PM
> To: Barry, Christopher
> Cc: [email protected]
> Subject: Re: IO fencing question
> 
> On Fri, Apr 07, 2006 at 12:26:45PM -0400, Barry, Christopher wrote:
> >     Thanks much for your answers. By 'soft', I mean a controlled
> > reboot/shutdown where the power remains on even though the OS has
> > obviously stopped running. I have not experienced any 
> actual failures of
> > anything, so I do not the outcome of that. Induced 'Hard' 
> failure (e.g.
> > pulling the plug) works perfectly.
> > 
> >     The more I look at it, and think about it, I'm guessing the
> > problem is more related to the redundant fibre ports on the 350-24T
> > switch, actually holding onto information about the directly connect
> > interface, and stubbornly sticking to it if it detects any kind of
> > signal whatsoever.
> 
> I experienced this same sort of weirdness when setting up a pair of
> redundant routers.  The two upstreams, which I had no control 
> over, ran
> OSPF.  If I powered off the machine, all was well.  If I simply halted
> the machine, or there was power to it at all, their OSPF daemon would
> detect a link and continue to route in the direction of our downed
> router.
> 
> The problem, in the end, was that the Dell 1850s "primary" onboard
> ethernet controller will exhibit link when there is power to 
> the board.
> The secondary, and any PCI/PCI-X cards that we added on afterward, did
> not exhibit this behavior.
> 
> -jon
> 


Thanks everyone for your ideas on this. As it turns out, the issue is
indeed the switch's redundant fiber port not releasing. As soon as power
hits the server's motherboard, a link is present on the switch - even
though all of my fiber NICs are in PCI slots. The only way I can
reliably failover the switch port is to remove power completely from the
router.

To do this, I'm thinking a combination of:
<http://freshmeat.net/projects/powerswitch/>
and:
<http://www.servertech.com/products/product.aspx?GroupID=1&ProductID=12#
>

Of course the powerswitch script will need a bit of hacking, and I'll
need to wrap the whole deal in a looping testing script, looking for
when stge0 on the backup becomes master. Then I'm thinking of attempting
a 'ssh master -c "halt -p"', waiting a certain amount of seconds, and
then switching off the power to the plug.

Does that sound like a reasonable approach? Anyone already done this and
have some lessons for me?


Thanks,
-C

Reply via email to