On 04/06/14 10:59 AM, Schaefer, Micah wrote:
I have a 4 node cluster, running a single service group. I have been
seeing node1 fence node3 while node3 is actively running the service group
at random intervals.

Rgmanager logs show no failures in service checks, and no other logs
provide any useful information. How can I go about finding out why node1
is fencing node3?

I currently set up the failover domain to be restricted and not include
node3.

cluster.conf : http://pastebin.com/xYy6xp6N

Random fencing is almost always caused by network failures. Can you look are the system logs, starting a little before the fence and continuing until after the fence completes, and paste them here? I suspect you will see corosync complaining.

If this is true, do your switches support persistent multicast? Do you use active/passive bonding? Have you tried different switch/cable/NIC?

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to