Re: [Linux-HA] various errors on ec2

Andrew Beekhof Fri, 16 May 2008 02:46:39 -0700

On Fri, May 16, 2008 at 11:41 AM, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On Thu, May 15, 2008 at 11:45:59AM -0700, Ryan Ernst wrote:
>> Hi,
>>
>> I'm still trying to get heartbeat working on ec2.  I am using ucast as
>> previously directed by this list.
>>
>> I've set a logfile on each box so I can see what they are doing. I'm
>> currently testing with 3 nodes.  My ha.cf looks like this:
>>
>> logfacility local0
>> # ucast members - everything but this server
>
> You can have an ucast directive for this server too.
>
>> ucast eth0 ip-10-251-43-97
>> ucast eth0 ip-10-251-27-191
>>
>> # nodes, including this server
>> node ip-10-251-43-210
>> node ip-10-251-43-97
>> node ip-10-251-27-191
>>
>> auto_failback off
>> respawn hacluster /usr/lib/heartbeat/ipfail
>> apiauth ipfail gid=haclient uid=hacluster
>> crm on
>>
>> logfile /var/log/ha.log
>>
>>
>> This is on the machine node ip-10-251-43-210.
>> The log file shows a couple things that I am baffled by.
>>
>> First, it seems there are a number of warnings for the uuid of a node
>> changing. Example:
>>
>> heartbeat[20366]: 2008/05/15_10:41:55 WARN: nodename ip-10-251-43-210 uuid
>> changed to ip-10-251-27-191
>>
>> After a few of these, I get errors that look like this:
>>
>> heartbeat[20366]: 2008/05/15_10:41:55 ERROR: send_rexmit_request: entry not
>> found in rexmit_hash_tablefor seq/node(40536 ip-10-251-43-210)
>>
>> And after those errors have repeated a lot, I get the following, intermixed
>> more of the warnings above:
>>
>> heartbeat[20366]: 2008/05/15_10:41:56 ERROR: should_drop_message: attempted
>> replay attack [ip-10-251-43-210]? [gen = 1210812007, curgen = 1210812019]
>
> You should stop the cluster, remove hostcache and hb_uuid files
> in /var/lib/heartbeat and start again. hb_uuid was probably
> copied between hosts or similar.
>
>> If it helps, here is my haresources file:
>>
>> ip-10-251-43-210 \
>>     ldirectord \
>>     LVSSyncDaemonSwap::master
>
> You can't run haresources based cluster with crm set to on. And
> it can't have more than two nodes either. If you need three nodes
> you'll have to switch to CRM/v2.


and disable ipfail - it wont work in a crm based cluster
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] various errors on ec2

Reply via email to