Hi

I 'm facing a problem :

when testing a two-nodes cluster with quorum disk, when
I poweroff the node1 , node 2 fences well the node 1 and
failovers the service, but in log of node 2 I have before and after
the fence success messages  many messages like this:
Apr 24 11:30:04 [EMAIL PROTECTED] qdiskd[13740]: <crit> Node 2 is undead.
Apr 24 11:30:04 [EMAIL PROTECTED] qdiskd[13740]: <alert> Writing eviction 
notice for node 2
Apr 24 11:30:05 [EMAIL PROTECTED] qdiskd[13740]: <crit> Node 2 is undead.
Apr 24 11:30:05 [EMAIL PROTECTED] qdiskd[13740]: <alert> Writing eviction 
notice for node 2
Apr 24 11:30:06 [EMAIL PROTECTED] qdiskd[13740]: <crit> Node 2 is undead.
Apr 24 11:30:06 [EMAIL PROTECTED] qdiskd[13740]: <alert> Writing eviction 
notice for node 2
Apr 24 11:30:07 [EMAIL PROTECTED] qdiskd[13740]: <crit> Node 2 is undead.
Apr 24 11:30:07 [EMAIL PROTECTED] qdiskd[13740]: <alert> Writing eviction 
notice for node 2
Apr 24 11:30:08 [EMAIL PROTECTED] qdiskd[13740]: <crit> Node 2 is undead.

The problem is that when on node1 , after the reboot I try to start
again the CS5 , cman fails with these messages in syslog :
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]:  Copyright (C) Red Hat, Inc.  
2004  All
rights reserved.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: cluster.conf (cluster name = 
A0ha2,
version = 1) found.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote copy of cluster.conf is 
from
quorate node.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]:  Local version # : 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]:  Remote version #: 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote copy of cluster.conf is 
from
quorate node.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]:  Local version # : 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]:  Remote version #: 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote copy of cluster.conf is 
from
quorate node.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]:  Local version # : 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]:  Remote version #: 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]: Remote copy of cluster.conf is 
from
quorate node.
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]:  Local version # : 1
Apr 24 11:47:02 [EMAIL PROTECTED] ccsd[11099]:  Remote version #: 1
Apr 24 11:47:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 30 seconds.
Apr 24 11:48:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 60 seconds.
Apr 24 11:48:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 90 seconds.
Apr 24 11:48:37 [EMAIL PROTECTED] ntpd[6179]: synchronized to 192.168.64.99, 
stratum 11
Apr 24 11:48:37 [EMAIL PROTECTED] ntpd[6179]: kernel time sync enabled 0001
Apr 24 11:49:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 120 seconds.
Apr 24 11:49:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 150 seconds.
Apr 24 11:50:01 [EMAIL PROTECTED] crond[11455]: (root) CMD (/usr/lib64/sa/sa1 1 
1)
Apr 24 11:50:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 180 seconds.
Apr 24 11:50:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 210 seconds.
Apr 24 11:51:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 240 seconds.
Apr 24 11:51:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 270 seconds.
Apr 24 11:52:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 300 seconds.
Apr 24 11:52:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 330 seconds.
Apr 24 11:53:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 360 seconds.
Apr 24 11:53:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 390 seconds.
Apr 24 11:54:01 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 420 seconds.
Apr 24 11:54:31 [EMAIL PROTECTED] ccsd[11099]: Unable to connect to cluster
infrastructure after 450 seconds ...
etc.

or also :
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Cluster is not quorate.  
Refusing connection.
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Error while processing connect:
Connection refused
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Invalid descriptor specified 
(-111).
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Someone may be attempting 
something evil.
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Error while processing get: 
Invalid
request descriptor
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Invalid descriptor specified 
(-111).
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Someone may be attempting 
something evil.
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Error while processing get: 
Invalid
request descriptor
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Invalid descriptor specified 
(-21).
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Someone may be attempting 
something evil.
Apr 24 10:17:37 [EMAIL PROTECTED] ccsd[11023]: Error while processing 
disconnect:
Invalid request descriptor
Apr 24 10:17:37 [EMAIL PROTECTED] rgmanager: [11331]: <notice> Cluster Service 
Manager
is stopped.


And I can't start it again, except after stopping the CS on both nodes.

My cluster.conf qdisk record is likewise :
<quorumd label="QDISK_2_0" interval="1" tko="10" votes="1" min_score="1">
     <heuristic interval="10" tko="3" program="ping -t1 -c1 192.168.64.99"
score="1"/>
     <heuristic interval="10" program="ping -t3 -c1 192.168.64.99" score="1"/>
</quorumd>

I need urgent help if you have any ideas on the problem ?

Thanks a lot
Regards.
Alain Moullé


--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to