Hi there, In one of my clusters I still have problems with "retransmit list" messages. The problem is not reproducible, sometimes while the cluster is changing his state (for example when migrating a vm from one node to another) it starts with the "retransmit list" messages and in the worst case it loose quorum.
I followed what Florian wrote here: http://www.hastexo.com/resources/hints-and-kinks/whats-totem-retransmit-list-all-about-corosync but I still got some doubts. I'm sure that this 9 node cluster is composed by identical machines and I'm quite sure that the network multicast has no problems, even if the nodes are distribuited on different enclosures. I said "quite" because I've done some tests with tools like MNC and the connection seems to be fine and not loosing anything. It seem that when a configuration message has to run over the ring, in some particular cases, everything collapse. Following Florian's article I've tried setting up a window_size of 300, but since everything is the same, I think that with a default netmtu of 1500 and following the man page of corosync I must not go over 170 (which is 1500/300). The point is: what else can I check? Does it make sense to set a window_size LOWER than 50? Thanks for your help, -- RaSca Mia Mamma Usa Linux: Niente รจ impossibile da capire, se lo spieghi bene! [email protected] http://www.miamammausalinux.org _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
