Hi there,
In one of my clusters I still have problems with "retransmit list"
messages. The problem is not reproducible, sometimes while the cluster
is changing his state (for example when migrating a vm from one node to
another) it starts with the "retransmit list" messages and in the worst
case it loose quorum.

I followed what Florian wrote here:
http://www.hastexo.com/resources/hints-and-kinks/whats-totem-retransmit-list-all-about-corosync
but I still got some doubts.

I'm sure that this 9 node cluster is composed by identical machines and
I'm quite sure that the network multicast has no problems, even if the
nodes are distribuited on different enclosures. I said "quite" because
I've done some tests with tools like MNC and the connection seems to be
fine and not loosing anything.

It seem that when a configuration message has to run over the ring, in
some particular cases, everything collapse. Following Florian's article
I've tried setting up a window_size of 300, but since everything is the
same, I think that with a default netmtu of 1500 and following the man
page of corosync I must not go over 170 (which is 1500/300).

The point is: what else can I check? Does it make sense to set a
window_size LOWER than 50?

Thanks for your help,

-- 
RaSca
Mia Mamma Usa Linux: Niente รจ impossibile da capire, se lo spieghi bene!
[email protected]
http://www.miamammausalinux.org
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to