On Fri, May 16, 2008 at 11:41 AM, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > Hi, > > On Thu, May 15, 2008 at 11:45:59AM -0700, Ryan Ernst wrote: >> Hi, >> >> I'm still trying to get heartbeat working on ec2. I am using ucast as >> previously directed by this list. >> >> I've set a logfile on each box so I can see what they are doing. I'm >> currently testing with 3 nodes. My ha.cf looks like this: >> >> logfacility local0 >> # ucast members - everything but this server > > You can have an ucast directive for this server too. > >> ucast eth0 ip-10-251-43-97 >> ucast eth0 ip-10-251-27-191 >> >> # nodes, including this server >> node ip-10-251-43-210 >> node ip-10-251-43-97 >> node ip-10-251-27-191 >> >> auto_failback off >> respawn hacluster /usr/lib/heartbeat/ipfail >> apiauth ipfail gid=haclient uid=hacluster >> crm on >> >> logfile /var/log/ha.log >> >> >> This is on the machine node ip-10-251-43-210. >> The log file shows a couple things that I am baffled by. >> >> First, it seems there are a number of warnings for the uuid of a node >> changing. Example: >> >> heartbeat[20366]: 2008/05/15_10:41:55 WARN: nodename ip-10-251-43-210 uuid >> changed to ip-10-251-27-191 >> >> After a few of these, I get errors that look like this: >> >> heartbeat[20366]: 2008/05/15_10:41:55 ERROR: send_rexmit_request: entry not >> found in rexmit_hash_tablefor seq/node(40536 ip-10-251-43-210) >> >> And after those errors have repeated a lot, I get the following, intermixed >> more of the warnings above: >> >> heartbeat[20366]: 2008/05/15_10:41:56 ERROR: should_drop_message: attempted >> replay attack [ip-10-251-43-210]? [gen = 1210812007, curgen = 1210812019] > > You should stop the cluster, remove hostcache and hb_uuid files > in /var/lib/heartbeat and start again. hb_uuid was probably > copied between hosts or similar. > >> If it helps, here is my haresources file: >> >> ip-10-251-43-210 \ >> ldirectord \ >> LVSSyncDaemonSwap::master > > You can't run haresources based cluster with crm set to on. And > it can't have more than two nodes either. If you need three nodes > you'll have to switch to CRM/v2.
and disable ipfail - it wont work in a crm based cluster _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
