On 29/05/2013, at 2:37 AM, Greg Woods <[email protected]> wrote: > I have two clusters that are both running CentOS 5.6 and > heartbeat-3.0.3-2.3.el5 (from the clusterlabs repo). THey are running > slightly different pacemaker versions (pacemaker-1.0.9.1-1.15.el5 on the > first one and pacemaker-1.0.12-1.el5 on the other) They both have > identical ha.cf files except that the bcast device names are different > (and they are correct for each case, I checked), like this: > > udpport 694 > bcast eth2 > bcast eth1 > use_logd off > logfile /var/log/halog > debugfile /var/log/hadebug > debug 1 > keepalive 2 > deadtime 15 > initdead 60 > node vmd1.ucar.edu > node vmd2.ucar.edu > auto_failback off > respawn hacluster /usr/lib64/heartbeat/ipfail > crm respawn
I don't know about the rest, but definitely do not use both ipfail and crm. Pick one :) > > On one of them (which maybe or maybe not coincidentally is having some > problems), I get these messages logged about every 2 seconds > in /var/log/halog, on the other I don't see them: > > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG: Dumping > message with 10 fields > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[0] : > [t=NS_ackmsg] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[1] : > [dest=vmx2.ucar.edu] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[2] : > [ackseq=3a0] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[3] : > [(1)destuuid=0x5ceb280(37 28)] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[4] : > [src=vmx1.ucar.edu] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[5] : > [(1)srcuuid=0x5ceb390(36 27)] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[6] : > [hg=4c97c17a] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[7] : > [ts=51a13435] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[8] : [ttl=3] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[9] : [auth=1 > 23b556bcb61a08abecf87cb6411c62e62cf99f0d] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG: Dumping > message with 12 fields > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[0] : > [t=status] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[1] : > [st=active] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[2] : > [dt=3a98] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[3] : > [protocol=1] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[4] : > [src=vmx1.ucar.edu] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[5] : > [(1)srcuuid=0x5ceb390(36 27)] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[6] : > [seq=17b] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[7] : > [hg=4c97c17a] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[8] : > [ts=51a13435] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[9] : > [ld=0.27 0.41 0.26 1/315 19183] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[10] : > [ttl=3] > May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[11] : > [auth=1 3d3da4df831636f7c274395041ffb49bbf215170] > > The questions are what do these messages actually mean, why is one > cluster logging them and not the other, and is this something I should > be worried about? > > Thanks for any info, > --Greg > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
