On Mon, Jun 15, 2009 at 4:17 PM, Moralejo, Alfredo < [email protected]> wrote:
> Hi, > > > > I’m having what I think is a timeouts issue in my cluster. > > > > I have a two node cluster using qdisk. Everytime the node that has the > master role for qdisk becomes down (for failure or even stopping qdiskd > manually), packages in the sane node are stopped because of the lack of > quorum as the qdiskd becames unresponsive until second node becames master > node and start working properly. Once qdiskd start working fine (usually 5-6 > seconds) packages are started again. > > > > I’ve read in the cluster manual section for “CMAN membership timeout > value” and I think this is the case. I’ve used RHEL 5.3 and I thought this > parameter is the token that I set much longer that needed: > > > > <cluster alias="CLUSTER_ENG" config_version="75" name="CLUSTER_ENG"> > > <totem token="50000"/> > > … > > > > <quorumd device="/dev/mapper/mpathquorump1" interval="3" > status_file="/tmp/qdisk" tko="3" votes="5" log_level="7" > log_facility="local4"/> > > > > > > Totem token is much more that double of qdisk timeout, so I guess it should > be enough but everytime qdisk dies in the master node I get same result, > services restarted in the sane node: > > > > Jun 15 16:11:33 rmamseslab07 qdiskd[14130]: <debug> Node 1 missed an update > (2/3) > > Jun 15 16:11:38 rmamseslab07 qdiskd[14130]: <debug> Node 1 missed an update > (3/3) > > Jun 15 16:11:43 rmamseslab07 qdiskd[14130]: <debug> Node 1 missed an update > (4/3) > > Jun 15 16:11:43 rmamseslab07 qdiskd[14130]: <debug> Node 1 DOWN > > Jun 15 16:11:43 rmamseslab07 qdiskd[14130]: <debug> Making bid for master > > Jun 15 16:11:44 rmamseslab07 clurgmgrd: [18510]: <info> Executing > /etc/init.d/watchdog status > > Jun 15 16:11:48 rmamseslab07 qdiskd[14130]: <debug> Node 1 missed an update > (5/3) > > Jun 15 16:11:53 rmamseslab07 qdiskd[14130]: <debug> Node 1 missed an update > (6/3) > > *Jun 15 16:11:53 rmamseslab07 qdiskd[14130]: <info> Assuming master role* > > > > Message from sysl...@rmamseslab07 at Jun 15 16:11:53 ... > > clurgmgrd[18510]: <emerg> #1: Quorum Dissolved > > Jun 15 16:11:53 rmamseslab07 openais[14087]: [CMAN ] lost contact with > quorum device > > Jun 15 16:11:53 rmamseslab07 openais[14087]: [CMAN ] quorum lost, blocking > activity > > Jun 15 16:11:53 rmamseslab07 clurgmgrd[18510]: <debug> Membership Change > Event > > *Jun 15 16:11:53 rmamseslab07 clurgmgrd[18510]: <emerg> #1: Quorum > Dissolved* > > Jun 15 16:11:53 rmamseslab07 clurgmgrd[18510]: <debug> Emergency stop of > service:Cluster_test_2 > > Jun 15 16:11:53 rmamseslab07 clurgmgrd[18510]: <debug> Emergency stop of > service:wdtcscript-rmamseslab05-ic > > Jun 15 16:11:53 rmamseslab07 clurgmgrd[18510]: <debug> Emergency stop of > service:wdtcscript-rmamseslab07-ic > > Jun 15 16:11:54 rmamseslab07 clurgmgrd[18510]: <debug> Emergency stop of > service:Logical volume 1 > > Jun 15 16:11:58 rmamseslab07 qdiskd[14130]: <debug> Node 1 missed an update > (7/3) > > Jun 15 16:11:58 rmamseslab07 qdiskd[14130]: <notice> Writing eviction > notice for node 1 > > Jun 15 16:11:58 rmamseslab07 qdiskd[14130]: <debug> Telling CMAN to kill > the node > > *Jun 15 16:11:58 rmamseslab07 openais[14087]: [CMAN ] quorum regained, > resuming activity* > > > > I’ve just logged a case but… any idea???? > > > > Regards, > > Hi! Have you set two_node="0" in cman section? Why don't you use any heuristics within the quorumd configuration? I.e: pinging a router... Could you paste us your cluster.conf? Greetings, Juanra > > > > > *Alfredo Moralejo* > Business Platforms Engineering - OS Servers - UNIX Senior Specialist > > F. Hoffmann-La Roche Ltd. > > Global Informatics Group Infrastructure > Josefa Valcárcel, 40 > 28027 Madrid SPAIN > > Phone: +34 91 305 97 87 > > [email protected] > > *Confidentiality Note:* This message is intended only for the use of the > named recipient(s) and may contain confidential and/or proprietary > information. If you are not the intended recipient, please contact the > sender and delete this message. Any unauthorized use of the information > contained in this message is prohibited. > > > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
