I know its in a ton of the example cluster configs out there, but we have had 
trouble whenever we try to use ping as a quorum heuristic.

What we have seen is whenever a large file transfer happens, the pings start to 
get dropped by Linux, and the heuristic test starts to fail, and the cluster 
node gets killed.

We have tried tweaking the interval, the count, the ttl, adding a -w etc.. and 
nothing really keeps the cluster from killing nodes when they get really busy 
network traffic.


I have personally had this heuristic work on small test environments, but once 
we put into production on really bust workloads it pretty much is useless.


It is a good idea in theory to use this because it would help ensure that in a 
split cluster situation you would end up with the box which had network 
connectivity would win over the one that did not. But...if it causes your 
cluster to die periodically its not worth it.


Is this a known issue, but its just never mentioned in any of the cluster setup 
examples?

Any one have a similar experience, or have any ideas on how to make it work in 
a very busy cluster environment?


Also, this makes me wonder, if I have a two node cluster, with each node 
getting 1 vote, the quorum getting 1 vote, and the heuristic getting 1 vote, 
but set the 'required' to only 2 votes, why would the heuristic cause a loss of 
quorum since the node with the quorum disk alone would have the needed two 
votes?


--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to