> -----Original Message----- > From: Jon Hart [mailto:[EMAIL PROTECTED] > Sent: Friday, April 07, 2006 1:25 PM > To: Barry, Christopher > Cc: [email protected] > Subject: Re: IO fencing question > > On Fri, Apr 07, 2006 at 12:26:45PM -0400, Barry, Christopher wrote: > > Thanks much for your answers. By 'soft', I mean a controlled > > reboot/shutdown where the power remains on even though the OS has > > obviously stopped running. I have not experienced any > actual failures of > > anything, so I do not the outcome of that. Induced 'Hard' > failure (e.g. > > pulling the plug) works perfectly. > > > > The more I look at it, and think about it, I'm guessing the > > problem is more related to the redundant fibre ports on the 350-24T > > switch, actually holding onto information about the directly connect > > interface, and stubbornly sticking to it if it detects any kind of > > signal whatsoever. > > I experienced this same sort of weirdness when setting up a pair of > redundant routers. The two upstreams, which I had no control > over, ran > OSPF. If I powered off the machine, all was well. If I simply halted > the machine, or there was power to it at all, their OSPF daemon would > detect a link and continue to route in the direction of our downed > router. > > The problem, in the end, was that the Dell 1850s "primary" onboard > ethernet controller will exhibit link when there is power to > the board. > The secondary, and any PCI/PCI-X cards that we added on afterward, did > not exhibit this behavior. > > -jon >
Thanks everyone for your ideas on this. As it turns out, the issue is indeed the switch's redundant fiber port not releasing. As soon as power hits the server's motherboard, a link is present on the switch - even though all of my fiber NICs are in PCI slots. The only way I can reliably failover the switch port is to remove power completely from the router. To do this, I'm thinking a combination of: <http://freshmeat.net/projects/powerswitch/> and: <http://www.servertech.com/products/product.aspx?GroupID=1&ProductID=12# > Of course the powerswitch script will need a bit of hacking, and I'll need to wrap the whole deal in a looping testing script, looking for when stge0 on the backup becomes master. Then I'm thinking of attempting a 'ssh master -c "halt -p"', waiting a certain amount of seconds, and then switching off the power to the plug. Does that sound like a reasonable approach? Anyone already done this and have some lessons for me? Thanks, -C

