On 16 Feb 2012, at 18:00, Mark Grennan wrote:

> Yes HA systems are very confusing.

It's not so much that - it's more that heartbeat/crm/pacemaker/corosync is 
confusing, not least because it keeps changing its name. Constant changing of 
names, nomenclature and config settings guarantees that any articles written 
about it won't work for long.

> Pacemaker is the name of an older application.  Corasync is it's new name but 
> some of the files still maintain the old name.

Huh? So why does corosync need setting up to work with pacemaker if it is now 
pacemaker? Even your doc installs them (and heartbeat) from separate packages!

> One Issue I can think of is, Pacemaker wants to bind the floating IP as 
> eth#:#, while MMM wants to use a different method that can only be seen with 
> the IP command.   I think they are fighting over who owns the floating IP.

But pacemaker isn't even running on the machines the mmm float is on! It's 
somehow interfering with the monitoring node, not the float that it's managing. 
I don't have a problem with using the ip command - I was under the impression 
it's how things are supposed to be done now? I've seen mixtures of 
ifconfig-style network config coexisting quite happily with ip-style ones 
before.

My original config:

server1: pacemaker
server2: pacemaker
server3: mmm monitor
server4: mmm agent
server5: mmm agent

There is a floating IP on servers 1 and 2, and another one on servers 4 and 5.

What I want to change to:

server2: pacemaker
server3: pacemaker + mmm monitor
server4: mmm agent
server5: mmm agent

Here there is a floating IP on 2 and 3, and another on 4 and 5. I don't see any 
reason they should conflict since there is no overlap of machines that floats 
are on. What seems to happen is that as soon as corosync is started, the mmm 
monitor can no longer see the network at all. I suspect this could be something 
to do with the suggested setting of using the network address for bindnetaddr 
in corosync.

I'm still mystified by whether I should use ucast, mcast or bcast - previous 
setups I've done with crm have used ucast. I see in your example you're binding 
to a private IP for corosync, but I can't understand why you're using a public 
IP for mcast, or why it's even there at all.

Your guide wasn't one of the ones I'd found, so thanks for the pointer. The 
most interesting one for me was this one, since it is closest to my own config 
and seems quite recent (i.e. it even mentions corosync): 
https://wiki.ubuntu.com/ClusterStack/LucidTesting
The official 'cluster from scratch' PDF skips over quite a few bits of vital 
info, so I found I couldn't really use it.

My mmm config was originally installed by Percona, and I've done several others 
since. mmm has always worked beautifully for me (even through multiple hardware 
and network failures), and the main complaint I've seen about it (1062 errors) 
is nothing to do with mmm. I fully understand that it has problems, however it 
has the advantage of being very stable and trivially easy to understand and 
configure. While I keep reading good things about pacemaker, the practical 
aspects of getting it to work have always turned into a yak-shaving festival, 
so I've always been put off pursuing it for anything beyond management of a 
single IP. One critical aspect of an HA system is that it should be really easy 
to deal with when things go wrong; I'd put xtrabackup in this category - it's 
great (though I hope you have automated tests for your restores as it went 
through a patch late last year when they were broken!).

Marcus
-- 
Marcus Bointon
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK info@hand CRM solutions
[email protected] | http://www.synchromedia.co.uk/



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to