On 16 Feb 2012, at 18:00, Mark Grennan wrote: > Yes HA systems are very confusing.
It's not so much that - it's more that heartbeat/crm/pacemaker/corosync is confusing, not least because it keeps changing its name. Constant changing of names, nomenclature and config settings guarantees that any articles written about it won't work for long. > Pacemaker is the name of an older application. Corasync is it's new name but > some of the files still maintain the old name. Huh? So why does corosync need setting up to work with pacemaker if it is now pacemaker? Even your doc installs them (and heartbeat) from separate packages! > One Issue I can think of is, Pacemaker wants to bind the floating IP as > eth#:#, while MMM wants to use a different method that can only be seen with > the IP command. I think they are fighting over who owns the floating IP. But pacemaker isn't even running on the machines the mmm float is on! It's somehow interfering with the monitoring node, not the float that it's managing. I don't have a problem with using the ip command - I was under the impression it's how things are supposed to be done now? I've seen mixtures of ifconfig-style network config coexisting quite happily with ip-style ones before. My original config: server1: pacemaker server2: pacemaker server3: mmm monitor server4: mmm agent server5: mmm agent There is a floating IP on servers 1 and 2, and another one on servers 4 and 5. What I want to change to: server2: pacemaker server3: pacemaker + mmm monitor server4: mmm agent server5: mmm agent Here there is a floating IP on 2 and 3, and another on 4 and 5. I don't see any reason they should conflict since there is no overlap of machines that floats are on. What seems to happen is that as soon as corosync is started, the mmm monitor can no longer see the network at all. I suspect this could be something to do with the suggested setting of using the network address for bindnetaddr in corosync. I'm still mystified by whether I should use ucast, mcast or bcast - previous setups I've done with crm have used ucast. I see in your example you're binding to a private IP for corosync, but I can't understand why you're using a public IP for mcast, or why it's even there at all. Your guide wasn't one of the ones I'd found, so thanks for the pointer. The most interesting one for me was this one, since it is closest to my own config and seems quite recent (i.e. it even mentions corosync): https://wiki.ubuntu.com/ClusterStack/LucidTesting The official 'cluster from scratch' PDF skips over quite a few bits of vital info, so I found I couldn't really use it. My mmm config was originally installed by Percona, and I've done several others since. mmm has always worked beautifully for me (even through multiple hardware and network failures), and the main complaint I've seen about it (1062 errors) is nothing to do with mmm. I fully understand that it has problems, however it has the advantage of being very stable and trivially easy to understand and configure. While I keep reading good things about pacemaker, the practical aspects of getting it to work have always turned into a yak-shaving festival, so I've always been put off pursuing it for anything beyond management of a single IP. One critical aspect of an HA system is that it should be really easy to deal with when things go wrong; I'd put xtrabackup in this category - it's great (though I hope you have automated tests for your restores as it went through a patch late last year when they were broken!). Marcus -- Marcus Bointon Synchromedia Limited: Creators of http://www.smartmessages.net/ UK info@hand CRM solutions [email protected] | http://www.synchromedia.co.uk/ _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
