Dear all, I'm having a bit of peculiar problem with Heartbeat. I'm installing a new HA cluster with Heartbeat and pacemaker. Right now there is only one node installed, because I'm preparing the install image for the entire cluster. Anyway, since the last restart of the machine, heartbeat is in a kind of endless loop sending broadcasts to the network at a very high rate. Essentially the process is sending as fast as the CPU speed allows it to send the stuff (around 3kHz of broadcasts). I'm not excluding this might be a misconfiguration on my side, but it seems more like a bug. The machine is still running in that mode, in case you want me to try some debugging.
Thanks for looking at this, Rainer Schwemmer Details about the configuration: The resource configuration is just an empty cluster with one node and no resources running. The versions of programs I am using: The server is installed with Linux RHEL5.5, 2.6.18-128.1.6.el5 #1 SMP -------------------------------- These are the heartbeat versions: heartbeat-mgmt-2.0.1-1.lhcb heartbeat-debuginfo-3.0.3-2.3.el5 heartbeat-3.0.3-2.3.el5 heartbeat-libs-3.0.3-2.3.el5 heartbeat-mgmt-debuginfo-2.0.1-1.lhcb heartbeat-devel-3.0.3-2.3.el5 cluster-glue-1.0.6-1.6.el5 cluster-glue-libs-1.0.6-1.6.el5 (Actually the heartbeat-mgmt is the pacemaker management module) -------------------------------- The pacemaker packages: pacemaker-debuginfo-1.0.10-1.4.el5 pacemaker-1.0.10-1.4.el5 pacemaker-libs-devel-1.0.10-1.4.el5 pacemaker-libs-1.0.10-1.4.el5 -------------------------------- Top of the machine: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9301 root -2 0 71260 11m 8008 R 100.1 0.0 5905:36 heartbeat: master control process 10988 hacluste 15 0 73928 3036 2456 S 56.1 0.0 3853:50 /usr/lib64/heartbeat/crmd 9319 root -2 0 69392 9.8m 8008 S 11.6 0.0 1030:46 heartbeat: write: bcast eth2 9320 root -2 0 69392 9.8m 8008 S 5.9 0.0 493:14.62 heartbeat: read: bcast eth2 16594 root 15 0 13264 1712 816 R 1.0 0.0 0:00.82 top -------------------------------- ha.cf: debugfile /var/log/ha-debug logfile /var/log/ha-log keepalive 1 warntime 8 deadtime 20 initdead 40 bcast eth2 auto_failback on autojoin any crm yes apiauth ipfail uid=hacluster apiauth ccm uid=hacluster,root apiauth cms uid=hacluster,root apiauth ping gid=haclient uid=hacluster,root apiauth default gid=haclient uid=hacluster,root apiauth mgmtd uid=root,ebonacco,rainer,hacluster respawn root /usr/lib64/heartbeat/mgmtd -v conn_logd_time 60 We are using broadcasts here, because we had some trouble with our switches and multicast. -------------------------------- This is what is inside the broadcasts it sends: >>> __name__=create_request_adv __name__=create_request_adv origin=te_rsc_command t=crmd version=3.0.1 subt=request reference=lrm_invoke-tengine-1301706874-335 crm_task=lrm_invoke crm_sys_to=lrmd crm_sys_from=tengine crm_host_to=store07.lbdaq.cern.ch dest=store07.lbdaq.cern.ch oseq=42e6c2c5 from_id=crmd to_id=crmd client_gen=4 src=store07 seq=42ec220d hg=4d8f206f ts=4d999a1a ld=1.67 1.87 2.05 4/883 16221 ttl=3 auth=1 d20f08208a30acc15a0492d438a9eec6 crm_xml=<crm_xml><rsc_op id="2" operation="probe_complete" operation_key="probe_complete" on_node="store07.lbdaq.cern.ch" on_node_uuid="bd27618a-860b-43e8-93a1-6706b79fbea5" transition-key="2:164:0:ab8d208d-09e4-4b10-b5db-103808bad101"><attributes CRM_meta_op_no_wait="true" crm_feature_set="3.0.1"/></rsc_op></crm_xml> client_gen=4 (1)destuuid=vSdhioYLQ+iToWcGt5++pQ== (1)srcuuid=vSdhioYLQ+iToWcGt5++pQ== <<< .>>> __name__=create_request_adv __name__=create_request_adv origin=te_rsc_command t=crmd version=3.0.1 subt=request reference=lrm_invoke-tengine-1301572752-37 crm_task=lrm_invoke crm_sys_to=lrmd crm_sys_from=tengine crm_host_to=store07.lbdaq.cern.ch dest=store07.lbdaq.cern.ch oseq=42e6c2c6 from_id=crmd to_id=crmd client_gen=4 src=store07 seq=42ec220e hg=4d8f206f ts=4d999a1a ld=1.67 1.87 2.05 2/883 16221 ttl=3 auth=1 35e66064c82f6e9c9b0f87f9825ead32 crm_xml=<crm_xml><rsc_op id="2" operation="probe_complete" operation_key="probe_complete" on_node="store07.lbdaq.cern.ch" on_node_uuid="bd27618a-860b-43e8-93a1-6706b79fbea5" transition-key="2:15:0:ab8d208d-09e4-4b10-b5db-103808bad101"><attributes CRM_meta_op_no_wait="true" crm_feature_set="3.0.1"/></rsc_op></crm_xml> client_gen=4 (1)destuuid=vSdhioYLQ+iToWcGt5++pQ== (1)srcuuid=vSdhioYLQ+iToWcGt5++pQ== <<< .>>> t=NS_ackmsg dest=store07 ackseq=42ec220f (1)destuuid=vSdhioYLQ+iToWcGt5++pQ== src=store07 (1)srcuuid=vSdhioYLQ+iToWcGt5++pQ== hg=4d8f206f ts=4d999a1a ttl=3 auth=1 30635c8380404531c17c874c0bbda22d <<< _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
