I have two clusters that are both running CentOS 5.6 and heartbeat-3.0.3-2.3.el5 (from the clusterlabs repo). THey are running slightly different pacemaker versions (pacemaker-1.0.9.1-1.15.el5 on the first one and pacemaker-1.0.12-1.el5 on the other) They both have identical ha.cf files except that the bcast device names are different (and they are correct for each case, I checked), like this:
udpport 694 bcast eth2 bcast eth1 use_logd off logfile /var/log/halog debugfile /var/log/hadebug debug 1 keepalive 2 deadtime 15 initdead 60 node vmd1.ucar.edu node vmd2.ucar.edu auto_failback off respawn hacluster /usr/lib64/heartbeat/ipfail crm respawn On one of them (which maybe or maybe not coincidentally is having some problems), I get these messages logged about every 2 seconds in /var/log/halog, on the other I don't see them: May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG: Dumping message with 10 fields May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[0] : [t=NS_ackmsg] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[1] : [dest=vmx2.ucar.edu] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[2] : [ackseq=3a0] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[3] : [(1)destuuid=0x5ceb280(37 28)] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[4] : [src=vmx1.ucar.edu] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[5] : [(1)srcuuid=0x5ceb390(36 27)] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[6] : [hg=4c97c17a] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[7] : [ts=51a13435] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[8] : [ttl=3] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[9] : [auth=1 23b556bcb61a08abecf87cb6411c62e62cf99f0d] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG: Dumping message with 12 fields May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[0] : [t=status] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[1] : [st=active] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[2] : [dt=3a98] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[3] : [protocol=1] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[4] : [src=vmx1.ucar.edu] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[5] : [(1)srcuuid=0x5ceb390(36 27)] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[6] : [seq=17b] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[7] : [hg=4c97c17a] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[8] : [ts=51a13435] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[9] : [ld=0.27 0.41 0.26 1/315 19183] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[10] : [ttl=3] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[11] : [auth=1 3d3da4df831636f7c274395041ffb49bbf215170] The questions are what do these messages actually mean, why is one cluster logging them and not the other, and is this something I should be worried about? Thanks for any info, --Greg _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems