Hi all,
I hope that you can help me with this strange problem. I've got a nine
node cluster which is configured with no-quorum-policy to stop.
Two days ago I came across this error on one of the nodes:

Oct 14 00:00:38 kvm06 kernel: Uhhuh. NMI received for unknown reason a1
on CPU 0.
Oct 14 00:00:38 kvm06 kernel: You have some hardware problem, likely on
the PCI bus.
Oct 14 00:00:38 kvm06 kernel: Dazed and confused, but trying to continue
Oct 14 00:00:43 kvm06 corosync[2027]:   [TOTEM ] A processor failed,
forming new configuration.

this error seemed to compromise the entire cluster activity. From this
moment on I received a lot of other notifications concerning network
connectivity all around the cluster. Everything ended with this:

Oct 14 00:05:06 kvm01 cib: [18970]: notice: ais_dispatch_message:
Membership 6924: quorum lost

And with the stop of all the cluster's resources.

I cannot exclude network connectivity problems, but since I've got
stonith configured for every node (with ipmi, and it is working on a
different and dedicated network channel), I was expecting that every
unreachable node got fenced, and this does not happened.

After the quorum error every cluster node went offline and the only way
to make things work again was to stop corosync on the first node (the
one with the "suspected" hardware problem). Of course I've checked the
sanity of the hardware of this machine and everything seemed to be fine.

What I don't understand is why I've lost the quorum since the problem
seemed to interest just one node (and I've got 9 nodes in total).
I know that without full logs it is impossible to understand the problem
but maybe you can be helpful with some suggestion.

Thanks a lot,

-- 
RaSca
Mia Mamma Usa Linux: Niente รจ impossibile da capire, se lo spieghi bene!
[email protected]
http://www.miamammausalinux.org
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to