Hi
I 'm facing a problem :
when testing a two-nodes cluster with quorum disk, when
I poweroff the node1 , node 2 fences well the node 1 and
failovers the service, but in log of node 2 I have before and after
the fence success messages many messages like this:
Apr 24 11:30:04 [EMAIL PROTECTED] qdiskd[13740]: <crit> Node 2 is undead.
Apr 24 11:30:04 [EMAIL PROTECTED] qdiskd[13740]: <alert> Writing eviction
notice for node 2
Apr 24 11:30:05 [EMAIL PROTECTED] qdiskd[13740]: <crit> Node 2 is undead.
Apr 24 11:30:05 [EMAIL PROTECTED] qdiskd[13740]: <alert> Writing eviction
notice for node 2
Apr 24 11:30:06 [EMAIL PROTECTED] qdiskd[13740]: <crit> Node 2 is undead.
Apr 24 11:30:06 [EMAIL PROTECTED] qdiskd[13740]: <alert> Writing eviction
notice for node 2
Apr 24 11:30:07 [EMAIL PROTECTED] qdiskd[13740]: <crit> Node 2 is undead.
Apr 24 11:30:07 [EMAIL PROTECTED] qdiskd[13740]: <alert> Writing eviction
notice for node 2
Apr 24 11:30:08 [EMAIL PROTECTED] qdiskd[13740]: <crit> Node 2 is undead.
The problem is that when on node1 , after the reboot I try to start
again the CS5 , cman fails with these messages in syslog :
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Copyright (C) Red Hat, Inc.
2004 All
rights reserved.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: cluster.conf (cluster name =
A0ha2,
version = 1) found.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote copy of cluster.conf is
from
quorate node.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Local version # : 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote version #: 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote copy of cluster.conf is
from
quorate node.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Local version # : 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote version #: 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote copy of cluster.conf is
from
quorate node.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Local version # : 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote version #: 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote copy of cluster.conf is
from
quorate node.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Local version # : 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote version #: 1
Apr 24 11:47:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 30 seconds.
Apr 24 11:48:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 60 seconds.
Apr 24 11:48:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 90 seconds.
Apr 24 11:48:37 [EMAIL PROTECTED] ntpd[6179]: synchronized to 192.168.64.99,
stratum 11
Apr 24 11:48:37 [EMAIL PROTECTED] ntpd[6179]: kernel time sync enabled 0001
Apr 24 11:49:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 120 seconds.
Apr 24 11:49:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 150 seconds.
Apr 24 11:50:01 [EMAIL PROTECTED] crond[11455]: (root) CMD (/usr/lib64/sa/sa1 1
1)
Apr 24 11:50:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 180 seconds.
Apr 24 11:50:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 210 seconds.
Apr 24 11:51:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 240 seconds.
Apr 24 11:51:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 270 seconds.
Apr 24 11:52:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 300 seconds.
Apr 24 11:52:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 330 seconds.
Apr 24 11:53:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 360 seconds.
Apr 24 11:53:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 390 seconds.
Apr 24 11:54:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 420 seconds.
Apr 24 11:54:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 450 seconds ...
etc.
or also :
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Cluster is not quorate.
Refusing connection.
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Error while processing connect:
Connection refused
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Invalid descriptor specified
(-111).
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Someone may be attempting
something evil.
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Error while processing get:
Invalid
request descriptor
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Invalid descriptor specified
(-111).
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Someone may be attempting
something evil.
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Error while processing get:
Invalid
request descriptor
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Invalid descriptor specified
(-21).
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Someone may be attempting
something evil.
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Error while processing
disconnect:
Invalid request descriptor
Apr 24 10:17:37 [EMAIL PROTECTED] rgmanager: [11331]: <notice> Cluster Service
Manager
is stopped.
And I can't start it again, except after stopping the CS on both nodes.
My cluster.conf qdisk record is likewise :
<quorumd label="QDISK_2_0" interval="1" tko="10" votes="1" min_score="1">
<heuristic interval="10" tko="3" program="ping -t1 -c1 192.168.64.99"
score="1"/>
<heuristic interval="10" program="ping -t3 -c1 192.168.64.99" score="1"/>
</quorumd>
I need urgent help if you have any ideas on the problem ?
Thanks a lot
Regards.
Alain Moullé
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster