Every time I've seen a "Retransmit List: ...", it's always turned out to be an actual network problem, save one case. There was a bug in corosync on on RHEL 6.1 (can't remember the actual version atm) where sufficiently slow computers would not be able to keep up. That bug was resolved some time ago, so I doubt that is the case here.
Given that corosync uses multicast by default, I'd start by looking at your switch and see if the multicast group is getting messed with. Failing that, I'd just go through normal network diagnostics. I am no corosync expert though. If someone speaks up and contradicts me, please take their advice. :) On 12/20/2012 07:40 AM, Ulrich Windl wrote: > Hi! > > I think I understand corosync/pacemaker a bit, but I'm wondering > occasionally: Today some node rebooted (still investigating why), and I > examined the syslog. > > Here's an interesting example: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] New Configuration: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.31) r(1) > ip(192.168.0.61) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.35) r(1) > ip(192.168.0.65) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] Members Left: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.33) r(1) > ip(192.168.0.63) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.34) r(1) > ip(192.168.0.64) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] Members Joined: > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] notice: pcmk_peer_update: > Transitional membership event on ring 2496: memb=2, new=0, lost=2 > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] info: pcmk_peer_update: memb: > o1 520295596 > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] info: pcmk_peer_update: memb: > o5 587404460 > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] info: pcmk_peer_update: lost: > o3 553850028 > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] info: pcmk_peer_update: lost: > o4 570627244 > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] CLM CONFIGURATION CHANGE > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] New Configuration: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.31) r(1) > ip(192.168.0.61) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.33) r(1) > ip(192.168.0.63) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.34) r(1) > ip(192.168.0.64) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.35) r(1) > ip(192.168.0.65) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] Members Left: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] Members Joined: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.33) r(1) > ip(192.168.0.63) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.34) r(1) > ip(192.168.0.64) > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] notice: pcmk_peer_update: > Stable membership event on ring 2496: memb=4, new=2, lost=0 > > Withing one second two nodes left the cluster/ring, then joined the > cluster/ring. Shouldn't the ring number increase on every change? > > In the very same second, three nodes left the cluster and joined again: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] CLM CONFIGURATION CHANGE > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] New Configuration: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.31) r(1) > ip(192.168.0.61) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] Members Left: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.33) r(1) > ip(192.168.0.63) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.34) r(1) > ip(192.168.0.64) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.35) r(1) > ip(192.168.0.65) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] Members Joined: > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] notice: pcmk_peer_update: > Transitional membership event on ring 2504: memb=1, new=0, lost=3 > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] info: pcmk_peer_update: memb: > o1 520295596 > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] info: pcmk_peer_update: lost: > o3 553850028 > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] info: pcmk_peer_update: lost: > o4 570627244 > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] info: pcmk_peer_update: lost: > o5 587404460 > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] CLM CONFIGURATION CHANGE > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] New Configuration: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.31) r(1) > ip(192.168.0.61) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.33) r(1) > ip(192.168.0.63) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.34) r(1) > ip(192.168.0.64) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.35) r(1) > ip(192.168.0.65) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] Members Left: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] Members Joined: > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.33) r(1) > ip(192.168.0.63) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.34) r(1) > ip(192.168.0.64) > Dec 20 10:34:57 o1 corosync[12690]: [CLM ] r(0) ip(172.20.3.35) r(1) > ip(192.168.0.65) > Dec 20 10:34:57 o1 corosync[12690]: [pcmk ] notice: pcmk_peer_update: > Stable membership event on ring 2504: memb=4, new=3, lost=0 > > A moment later I saw this: > Dec 20 10:34:57 rkdvmso1 kernel: [185601.044523] kernel BUG at > /usr/src/packages/BUILD/ocfs2-1.6/xen/ocfs2/heartbeat.c:67! > [...] > Dec 20 10:34:58 o1 kernel: [185601.044674] Supported: Yes > Dec 20 10:34:58 o1 kernel: [185601.044678] > Dec 20 10:34:59 o1 kernel: [185601.044682] Pid: 14239, comm: ocfs2_controld. > Not tainted 3.0.42-0.7-xen #1 Sun Microsystems Sun Fire X4100 Server/Sun Fire > X4100 Server > Dec 20 10:34:59 o1 kernel: [185601.044692] RIP: e030:[<ffffffffa06818f5>] > [<ffffffffa06818f5>] ocfs2_do_node_down+0x65/0x70 [ocfs2] > Dec 20 10:35:00 o1 kernel: [185601.044745] RSP: e02b:ffff880032331e18 > EFLAGS: 00010246 > Dec 20 10:35:00 o1 kernel: [185601.044749] RAX: 0000000000000000 RBX: > ffff880032960da0 RCX: 000000000000001f > Dec 20 10:35:00 o1 kernel: [185601.044753] RDX: 0000000000000000 RSI: > ffff8800314b5000 RDI: 000000001f0314ac > [???] > > (The bug messages were interleaved with cluster messages (cLVM and OCFS2 are > quite chatty). Before completion, SBD kicked in:) > > Dec 20 10:34:59 o1 sbd: [12635]: info: Received command off from o3 on disk > /dev/disk/by-id/dm-name-Shared-E1_part1 > Dec 20 10:34:59 o1 sbd: [12636]: info: Received command off from o3 on disk > /dev/disk/by-id/dm-name-Shared-E2_part1 > Dec 20 10:34:59 o1 cluster-dlm: check_fencing_done: > 0192F256F87A4E5CA69BCF2BDF7659FA check_fencing 520295596 wait add 1355810586 > fail 1355996098 last 0 > Dec 20 10:34:59 o1 sbd: [12635]: info: sysrq-trigger: o > Dec 20 10:34:59 o1 sbd: [12636]: info: sysrq-trigger: o > Dec 20 10:34:59 o1 sbd: [12635]: EMERG: Rebooting system. Reason: sbd is > self-fencing (power-off) > Dec 20 10:34:59 o1 sbd: [12636]: EMERG: Rebooting system. Reason: sbd is > self-fencing (power-off) > > The following reboot also replaced the kernel 3.0.42-0.7-xen with > 3.0.51-0.7.9-xen (a reboot was intended anyway, but manually ;-) > > (Reboot also fenced the DC, and another DC was elected) > > After a short wile I saw messages like these: > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #2 of cpg_mcast_joined: > SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #3 of cpg_mcast_joined: > SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #4 of cpg_mcast_joined: > SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #5 of cpg_mcast_joined: > SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #6 of cpg_mcast_joined: > SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #7 of cpg_mcast_joined: > SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #8 of cpg_mcast_joined: > SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #9 of cpg_mcast_joined: > SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #10 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #20 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #30 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #40 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #50 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #60 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #70 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #80 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #90 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #100 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #200 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #300 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #400 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #500 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #600 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2] Retry #700 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:40 o1 cmirrord[16392]: [35cRf7c2] Retry #800 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:40 o1 cmirrord[16392]: [35cRf7c2] Retry #900 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Dec 20 10:42:40 o1 cmirrord[16392]: [35cRf7c2] Retry #1000 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN - OpenAIS not handling the load? > Dec 20 10:42:41 o1 cmirrord[16392]: [35cRf7c2] Retry #2000 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN - OpenAIS not handling the load? > Dec 20 10:42:42 o1 cmirrord[16392]: [35cRf7c2] Retry #3000 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN - OpenAIS not handling the load? > Dec 20 10:42:43 o1 cmirrord[16392]: [35cRf7c2] Retry #4000 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN - OpenAIS not handling the load? > Dec 20 10:42:44 o1 cluster-dlm: update_cluster: Processing membership 2536 > Dec 20 10:42:44 o1 corosync[12829]: [CLM ] CLM CONFIGURATION CHANGE > Dec 20 10:42:44 o1 corosync[12829]: [CLM ] New Configuration: > Dec 20 10:42:44 o1 cluster-dlm: dlm_process_node: Skipped active node > 520295596: born-on=2520, last-seen=2536, this-event=2536, last-event=2524 > Dec 20 10:42:44 o1 corosync[12829]: [CLM ] r(0) ip(172.20.3.31) r(1) > ip(192.168.0.61) > Dec 20 10:42:44 o1 corosync[12829]: [CLM ] r(0) ip(172.20.3.32) r(1) > ip(192.168.0.62) > Dec 20 10:42:44 o1 cluster-dlm: dlm_process_node: Skipped active node > 537072812: born-on=2512, last-seen=2536, this-event=2536, last-event=2524 > Dec 20 10:42:44 o1 corosync[12829]: [CLM ] r(0) ip(172.20.3.34) r(1) > ip(192.168.0.64) > Dec 20 10:42:44 o1 corosync[12829]: [CLM ] r(0) ip(172.20.3.35) r(1) > ip(192.168.0.65) > Dec 20 10:42:44 o1 corosync[12829]: [CLM ] Members Left: > Dec 20 10:42:44 o1 corosync[12829]: [CLM ] Members Joined: > Dec 20 10:42:44 o1 corosync[12829]: [pcmk ] notice: pcmk_peer_update: > Transitional membership event on ring 2536: memb=4, new=0, lost=0 > > At some later time I saw the start of what I call "retransmit pyramid": > Dec 20 11:07:38 o1 corosync[12829]: [TOTEM ] Retransmit List: d2d > Dec 20 11:07:38 o1 corosync[12829]: [TOTEM ] Retransmit List: d2f > Dec 20 11:07:38 o1 corosync[12829]: [TOTEM ] Retransmit List: d31 > Dec 20 11:07:38 o1 corosync[12829]: [TOTEM ] Retransmit List: d33 > Dec 20 11:07:38 o1 corosync[12829]: [TOTEM ] Retransmit List: d35 > Dec 20 11:07:38 o1 corosync[12829]: [TOTEM ] Retransmit List: d37 > Dec 20 11:07:38 o1 corosync[12829]: [TOTEM ] Retransmit List: d39 > [...] > Dec 20 11:07:38 o1 corosync[12829]: [TOTEM ] Retransmit List: d5f > Dec 20 11:07:38 o1 corosync[12829]: [TOTEM ] Retransmit List: d60 > Dec 20 11:07:38 o1 corosync[12829]: [TOTEM ] Retransmit List: d61 > Dec 20 11:07:38 o1 corosync[12829]: [TOTEM ] Retransmit List: d62 > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d68 > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d69 > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d6a > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d6a > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d6c > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Marking ringid 0 interface > 172.20.3.31 FAULTY > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b d7c > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:41 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > [...] > Dec 20 11:07:42 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:42 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:42 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:42 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:42 o1 corosync[12829]: [TOTEM ] Automatically recovered ring 0 > Dec 20 11:07:42 o1 corosync[12829]: [TOTEM ] Automatically recovered ring 0 > Dec 20 11:07:42 o1 corosync[12829]: [TOTEM ] Automatically recovered ring 0 > Dec 20 11:07:42 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:42 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:42 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:42 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:43 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:43 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:43 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:43 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:43 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:44 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:44 o1 corosync[12829]: [TOTEM ] Marking ringid 0 interface > 172.20.3.31 FAULTY > Dec 20 11:07:44 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:44 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:44 o1 corosync[12829]: [TOTEM ] Automatically recovered ring 0 > Dec 20 11:07:44 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:44 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:45 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:07:45 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:08:12 o1 corosync[12829]: [TOTEM ] Retransmit List: d78 d79 d7a > d7b d7c d7d d7e d7f d80 d81 > Dec 20 11:08:12 o1 corosync[12829]: [TOTEM ] Retransmit List: d78 d79 d7a > d7b d7c d7d d7e d7f d80 d81 > Dec 20 11:08:12 o1 corosync[12829]: [TOTEM ] Retransmit List: d78 d79 d7a > d7b d7c d7d d7e d7f d80 d81 > Dec 20 11:08:12 o1 corosync[12829]: [TOTEM ] Automatically recovered ring 1 > Dec 20 11:08:12 o1 corosync[12829]: [TOTEM ] Automatically recovered ring 1 > Dec 20 11:08:12 o1 corosync[12829]: [TOTEM ] Retransmit List: d78 d79 d7a > d7b d7c d7d d7e d7f d80 d81 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d96 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Marking ringid 0 interface > 172.20.3.31 FAULTY > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9a > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9a > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9a > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9a > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: d79 d7a d7b > d7c d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 > da8 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: da8 d79 d7a > d7b d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 > da9 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: da9 d79 d7a > d7b d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 > da8 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: da8 d79 d7a > d7b d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 > da9 > [...] > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: da8 d79 d7a > d7b d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 > da9 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: da9 d79 d7a > d7b d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 > da8 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: da8 d79 d7a > d7b d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 > da9 > Dec 20 11:08:13 o1 corosync[12829]: [TOTEM ] Retransmit List: da9 d79 d82 > d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 da8 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: d98 d9c d9d > d9e da0 da1 da2 da3 da4 da5 da6 da7 da8 da9 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: d98 d9c d9d > d9e da0 da1 da2 da3 da4 da5 da6 da7 da8 da9 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: d98 d9c d9d > d9e da0 da1 da2 da3 da4 da5 da6 da7 da8 da9 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Automatically recovered ring 0 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: d9b d9f d98 > d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 da8 da9 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: d9c d9e da1 > da3 da5 da7 da9 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: d9e da3 da7 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: da3 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: da3 dcc dcd > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: dcc > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: dcf > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: dcf > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: dcf dd0 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: dd0 dd1 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: dd0 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: dd4 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: dd7 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: dd9 > Dec 20 11:08:14 o1 corosync[12829]: [TOTEM ] Retransmit List: dda > [...] > Dec 20 11:08:15 o1 corosync[12829]: [TOTEM ] Retransmit List: e74 > Dec 20 11:08:15 o1 corosync[12829]: [TOTEM ] Retransmit List: e76 > Dec 20 11:08:15 o1 corosync[12829]: [TOTEM ] Retransmit List: e76 e77 > Dec 20 11:08:15 o1 corosync[12829]: [TOTEM ] Retransmit List: e76 e78 > Dec 20 11:08:15 o1 corosync[12829]: [TOTEM ] Retransmit List: e76 > Dec 20 11:08:15 o1 corosync[12829]: [TOTEM ] Retransmit List: e76 > Dec 20 11:08:15 o1 corosync[12829]: [TOTEM ] Marking ringid 1 interface > 192.168.0.61 FAULTY > Dec 20 11:08:16 o1 corosync[12829]: [TOTEM ] Automatically recovered ring 1 > Dec 20 11:08:16 o1 corosync[12829]: [TOTEM ] Automatically recovered ring 1 > Dec 20 11:08:16 o1 corosync[12829]: [TOTEM ] Automatically recovered ring 1 > Dec 20 11:08:19 o1 corosync[12829]: [TOTEM ] Retransmit List: e82 > Dec 20 11:08:19 o1 corosync[12829]: [TOTEM ] Retransmit List: e85 > Dec 20 11:08:24 o1 cib: [12867]: info: cib_stats: Processed 61 operations > (0.00us average, 0% utilization) in the last 10min > > So there was a significant "blackout" of communications. I always wondered > whether this is purely a software problem. At the same time I had even a > longer retransmit list on another node, while some nodes showed no problem at > all: > > Dec 20 11:08:11 o5 corosync[12677]: [TOTEM ] Retransmit List: d65 d66 d67 > d68 d69 d6a d6b d6c d6d d6e d6f d70 d71 d72 d73 d74 d75 d76 d77 d78 d79 d7a > d7b d7c d7d d7e d7f d80 d81 d82 > > Does anybody know what causes these messages? > Regards, > Ulrich > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
