On Oct 4, 2013, at 3:48 PM, "Collinson.Shannon" 
<[email protected]<mailto:[email protected]>> wrote:

We do have some F5 appliances, but I think that's just for network routing--if 
there's any kind of F5 that could help manage an HA solution (making a passive 
server active or something like that), we don't have them.

On some platforms, F5 used to ship an agent that could communicate system load 
info back to the load balancer to help it make decisions about where to send 
work, as well as making some response time measurements (Cisco and Alteon used 
to do something similar as well). I think all the companies have stopped doing 
that because of the long term maintenance implications; most now rely on a 
simpler heartbeat probe from the load balancer and measuring transaction pass 
through duration to make guesstimates about load. I don't think there are hooks 
to trigger host failover; it just notices the node in question stops responding 
and drops it out of the pool for that service after a timeout threshold is 
reached.

And we are looking at Oracle RAC as a possibility for the oracle databases 
we're hoping to migrate to zlinux, but the oracle "platform owner" wanted to 
explore something cheaper (keeping the critical Oracle stuff that requires RAC 
on the midrange servers it currently uses,

Yeah. RAC works really well, but at a price.


but looking for some poor-man's HA to at least provide active-passive support 
on zlinux).  And we tried out the MQ Multi-Instance setup for our websphere MQ 
and Broker servers but could never get it working as advertised, so that'll be 
another application we'll need to support.

Funny enough, that's the application that we've been spending the most time on. 
:) we found several bugs in the MQ tools that are supposed to verify integrity 
on the shared disk that generate false positives for failure cases. You may 
have hit those bugs - check the latest MQ service stream.


.  We're also playing with scripted HA for that which would use shared disks 
across the servers that would be managed by testing to see if the logical 
volume was in use--a really homegrown solution which I think would be more 
problematic than LinuxHA.

SSI makes this somewhat easier, but there are a LOT of ugly edge cases here. 
You have to coordinate activity at both VM and Linux layers, so you'll need 
tooling at both levels to make this work. Here be big hungry dragons. A 
cluster-aware file system is almost mandatory, given Linux' assumption that 
caching data in RAM is always a good idea for I/O avoidance.

If you try the DIY route read up on hardware requirements for shared disk 
(especially if multiple CECs are involved) and remember to turn MDC off on the 
minidisks you're sharing. Otherwise, you'll be very, very sad.


When you say the setup for LinuxHA is complicated, how bad is it?  Did you have 
to resort to bugging SuSE for configuration help, or were you able to work it 
all through with the documentation on the org site/maybe polling the 
interested-users list for it?  not that I think we're anywhere near as 
knowledgeable as you and your team with zlinux, but if you guys had to go to 
the vendor for assistance, we shouldn't even contemplate it!  and I just got 
word that okay, yeah, we can add LinuxHA to the running for the "generic HA" 
solution we're looking to find.  (I guess reorgs are good in rare occasions, 
such as moving folks obstinate to what seem like good ideas...)

In HAO, this is driven by a XML-based config file, either hand-created, or via 
a web based console from a designated cluster manager system, which is then 
deployed to all the individual cluster nodes, in order to avoid a SPOF. IIRC, 
LinuxHA has something similar for the configuration process, but I'm not sure 
whether it includes a console (haven't looked at it in a while). The 
presentation I sent you off list has screenshots of how the console works 
(copies online at 
http://www.sinenomine.net/sites/www.sinenomine.net/files/High%20Availability.pdf
 )

Whatever we come up with would be something we hope could be exploited on all 
Linux servers (on any platform) at SunTrust--chances are, it'd only be 
cost-effective and training-effective if it was common, and right now, we 
actually don't have any standard HA product on our intel Linux side either, so 
this'd be a good time to find one.

Good plan. No need to invent two different wheels.


----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to