On Oct 4, 2013, at 3:48 PM, "Collinson.Shannon" <[email protected]<mailto:[email protected]>> wrote:
We do have some F5 appliances, but I think that's just for network routing--if there's any kind of F5 that could help manage an HA solution (making a passive server active or something like that), we don't have them. On some platforms, F5 used to ship an agent that could communicate system load info back to the load balancer to help it make decisions about where to send work, as well as making some response time measurements (Cisco and Alteon used to do something similar as well). I think all the companies have stopped doing that because of the long term maintenance implications; most now rely on a simpler heartbeat probe from the load balancer and measuring transaction pass through duration to make guesstimates about load. I don't think there are hooks to trigger host failover; it just notices the node in question stops responding and drops it out of the pool for that service after a timeout threshold is reached. And we are looking at Oracle RAC as a possibility for the oracle databases we're hoping to migrate to zlinux, but the oracle "platform owner" wanted to explore something cheaper (keeping the critical Oracle stuff that requires RAC on the midrange servers it currently uses, Yeah. RAC works really well, but at a price. but looking for some poor-man's HA to at least provide active-passive support on zlinux). And we tried out the MQ Multi-Instance setup for our websphere MQ and Broker servers but could never get it working as advertised, so that'll be another application we'll need to support. Funny enough, that's the application that we've been spending the most time on. :) we found several bugs in the MQ tools that are supposed to verify integrity on the shared disk that generate false positives for failure cases. You may have hit those bugs - check the latest MQ service stream. . We're also playing with scripted HA for that which would use shared disks across the servers that would be managed by testing to see if the logical volume was in use--a really homegrown solution which I think would be more problematic than LinuxHA. SSI makes this somewhat easier, but there are a LOT of ugly edge cases here. You have to coordinate activity at both VM and Linux layers, so you'll need tooling at both levels to make this work. Here be big hungry dragons. A cluster-aware file system is almost mandatory, given Linux' assumption that caching data in RAM is always a good idea for I/O avoidance. If you try the DIY route read up on hardware requirements for shared disk (especially if multiple CECs are involved) and remember to turn MDC off on the minidisks you're sharing. Otherwise, you'll be very, very sad. When you say the setup for LinuxHA is complicated, how bad is it? Did you have to resort to bugging SuSE for configuration help, or were you able to work it all through with the documentation on the org site/maybe polling the interested-users list for it? not that I think we're anywhere near as knowledgeable as you and your team with zlinux, but if you guys had to go to the vendor for assistance, we shouldn't even contemplate it! and I just got word that okay, yeah, we can add LinuxHA to the running for the "generic HA" solution we're looking to find. (I guess reorgs are good in rare occasions, such as moving folks obstinate to what seem like good ideas...) In HAO, this is driven by a XML-based config file, either hand-created, or via a web based console from a designated cluster manager system, which is then deployed to all the individual cluster nodes, in order to avoid a SPOF. IIRC, LinuxHA has something similar for the configuration process, but I'm not sure whether it includes a console (haven't looked at it in a while). The presentation I sent you off list has screenshots of how the console works (copies online at http://www.sinenomine.net/sites/www.sinenomine.net/files/High%20Availability.pdf ) Whatever we come up with would be something we hope could be exploited on all Linux servers (on any platform) at SunTrust--chances are, it'd only be cost-effective and training-effective if it was common, and right now, we actually don't have any standard HA product on our intel Linux side either, so this'd be a good time to find one. Good plan. No need to invent two different wheels. ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/
