Rolling back to previous openais package allowed me to restart cman. From openais-0.80.3-22el5 to openais-0.80.3-15.el5.
2009/1/28 Dave Costakos <[email protected]> > Like you, I've run into this same issue. I have 2 clusters that I'm trying > to update in our lab. On one, I only updated the cman and rgmanager > packages: this update was successful. On another I did a full update to 5.3 > and ran into what appears to be this same problem. II've noticed that > manually attempting to start cman via 'cman_tool -d join' prints out this > message right before cman fails. > > aisexec: ckpt.c:3961: > message_handler_req_exec_ckpt_sync_checkpoint_refcount:Assertion `checkpoint > != ((void *)0)' failed > > > I suspect an openais issue, would someone be able to confirm that? > > Also, II'm going to try downgrading openais back to the version from RHEL 5.2 > to see if that fixes it (though I won't get to that until the end of today). > If that works, I'll report back. > > > > 2009/1/27 Alan A <[email protected]> > > I just opened RHEL case number 1890184 regarding the same issue. First >> Kernel would not start due to the HP ILO driver conflict, but at the same >> time CMAN broke, and fencing fails. I rolled back cman rpm to the previous >> version but problem persists. Something else changed to affect CMAN not >> starting again. >> >> 2009/1/27 Gunther Schlegel <[email protected]> >> >>> Hello, >>> >>> I updated one node from 5.2 to 5.3 using yum update and now cman does not >>> start up anymore -- looks like ccsd has some problems: >>> >>> [r...@motel6 /]# /sbin/ccsd -4 -n >>> Starting ccsd 2.0.98: >>> Built: Dec 3 2008 16:32:30 >>> Copyright (C) Red Hat, Inc. 2004 All rights reserved. >>> IP Protocol:: IPv4 only >>> No Daemon:: SET >>> >>> Cluster is not quorate. Refusing connection. >>> Error while processing connect: Connection refused >>> Cluster is not quorate. Refusing connection. >>> Error while processing connect: Connection refused >>> Unable to connect to cluster infrastructure after 30 seconds. >>> Unable to connect to cluster infrastructure after 60 seconds. >>> >>> >>> When starting ccsd using /etc/init.d/cman it reports all three nodes to >>> be on cluster.conf version 78, so I guess it is not a network connectivity >>> problem. >>> >>> The other two nodes (still on 5.2z) of the cluster are up and running >>> with quorum. Openais is talking to those 2 other nodes and it looks fine to >>> me: >>> >>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] Members Joined: >>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] #011r(0) ip(10.11.5.22) >>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] #011r(0) ip(10.11.5.23) >>> Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node is within the >>> primary component and will provide service. >>> Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering OPERATIONAL state. >>> Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum regained, resuming >>> activity >>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] got nodejoin message >>> 10.11.5.21 >>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] got nodejoin message >>> 10.11.5.22 >>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] got nodejoin message >>> 10.11.5.23 >>> >>> >>> I am a bit lost... >>> >>> cluster.conf: >>> [r...@motel6 init.d]# cat /etc/cluster/cluster.conf >>> <?xml version="1.0"?> >>> <cluster alias="RSIXENCluster2" config_version="87" >>> name="RSIXENCluster2"> >>> <fence_daemon clean_start="0" post_fail_delay="0" >>> post_join_delay="3"/> >>> <clusternodes> >>> <clusternode name="concorde.riege.de" nodeid="1" >>> votes="1"> >>> <fence> >>> <method name="1"> >>> <device name="Concorde_IPMI"/> >>> </method> >>> </fence> >>> </clusternode> >>> <clusternode name="motel6.riege.de" nodeid="2" votes="1"> >>> <fence> >>> <method name="1"> >>> <device name="Motel6_IPMI"/> >>> </method> >>> </fence> >>> </clusternode> >>> <clusternode name="mercure.riege.de" nodeid="3" >>> votes="1"> >>> <fence> >>> <method name="1"> >>> <device name="Mercure_IPMI"/> >>> </method> >>> </fence> >>> </clusternode> >>> </clusternodes> >>> <fencedevices> >>> <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.132" >>> login="root" name="Concorde_IPMI" passwd="XXX"/> >>> <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.131" >>> login="root" name="Motel6_IPMI" passwd="xxx"/> >>> <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.133" >>> login="root" name="Mercure_IPMI" passwd="XXX"/> >>> </fencedevices> >>> <rm> >>> <failoverdomains> >>> <failoverdomain name="Earth" nofailback="1" >>> ordered="1" restricted="1"> >>> <failoverdomainnode name=" >>> concorde.riege.de" priority="1"/> >>> <failoverdomainnode name="motel6.riege.de" >>> priority="1"/> >>> <failoverdomainnode name=" >>> mercure.riege.de" priority="1"/> >>> </failoverdomain> >>> <failoverdomain name="Europe" nofailback="0" >>> ordered="1" restricted="0"> >>> <failoverdomainnode name=" >>> concorde.riege.de" priority="2"/> >>> </failoverdomain> >>> <failoverdomain name="North America" >>> nofailback="0" ordered="1" restricted="0"> >>> <failoverdomainnode name="motel6.riege.de" >>> priority="2"/> >>> </failoverdomain> >>> <failoverdomain name="Africa" nofailback="0" >>> ordered="1" restricted="0"> >>> <failoverdomainnode name=" >>> mercure.riege.de" priority="1"/> >>> </failoverdomain> >>> </failoverdomains> >>> <resources/> >>> <vm autostart="1" domain="Africa" exclusive="0" >>> migrate="live" name="vm64.test.riege.de_64" path="/etc/xen" >>> recovery="restart"/> >>> <vm autostart="1" domain="North America" exclusive="0" >>> migrate="pause" name="rt.test.riege.de_32" path="/etc/xen" >>> recovery="restart"/> >>> <vm autostart="1" domain="Africa" exclusive="0" >>> migrate="pause" name="poincare.riege.de_32" path="/etc/xen" >>> recovery="restart"/> >>> <vm autostart="1" domain="North America" exclusive="0" >>> migrate="live" name="jboss.dev.riege.de_64" path="/etc/xen" >>> recovery="relocate"/> >>> <vm autostart="1" domain="Africa" exclusive="0" >>> migrate="live" name="master.cc3.dev.riege.de_64" path="/etc/xen" >>> recovery="relocate"/> >>> <vm autostart="1" domain="Europe" exclusive="0" >>> migrate="pause" name="test.alphatrans.scope.riege.com_32" path="/etc/xen" >>> recovery="relocate"/> >>> <vm autostart="1" domain="North America" exclusive="0" >>> migrate="live" name="slave.cc3.dev.riege.de_64" path="/etc/xen" >>> recovery="restart"/> >>> <vm autostart="1" domain="North America" exclusive="0" >>> migrate="live" name="webmail.riege.com_64" path="/etc/xen" >>> recovery="relocate"/> >>> <vm autostart="1" domain="Europe" exclusive="0" >>> migrate="live" name="live.rsi.scope.riege.com_64" path="/etc/xen" >>> recovery="relocate"/> >>> <vm autostart="1" domain="Europe" exclusive="0" >>> migrate="pause" name="qa-16.rsi.scope.riege.com_32" path="/etc/xen" >>> recovery="relocate"/> >>> <vm autostart="1" domain="Africa" exclusive="0" >>> migrate="pause" name="qa-18.rsi.scope.riege.com_32" path="/etc/xen" >>> recovery="relocate"/> >>> <vm autostart="1" domain="Africa" exclusive="0" >>> migrate="pause" name="vm32.test.riege.de_32" path="/etc/xen" >>> recovery="restart"/> >>> <vm autostart="1" domain="Europe" exclusive="0" >>> migrate="pause" name="qa-head.rsi.scope.riege.com_32" path="/etc/xen" >>> recovery="restart"/> >>> <vm autostart="1" domain="North America" exclusive="0" >>> migrate="live" name="mq.dev.riege.de_64" path="/etc/xen" >>> recovery="relocate"/> >>> <vm autostart="1" domain="Europe" exclusive="0" >>> migrate="live" name="archive.dev.riege.de_64" path="/etc/xen" >>> recovery="restart"/> >>> </rm> >>> <cman quorum_dev_poll="50000"/> >>> <totem consensus="4800" join="60" token="60000" >>> token_retransmits_before_loss_const="20"/> >>> <quorumd device="/dev/mapper/Quorum_Partition" interval="3" >>> min_score="1" tko="10" votes="2"/> >>> </cluster> >>> >>> best regards, Gunther >>> >>> -- >>> ............................................................. >>> Riege Software International GmbH Fon: +49 (2159) 9148 0 >>> Mollsfeld 10 Fax: +49 (2159) 9148 11 >>> 40670 Meerbusch Web: www.riege.com >>> Germany E-Mail: [email protected] >>> --- --- >>> Handelsregister: Managing Directors: >>> Amtsgericht Neuss HRB-NR 4207 Christian Riege >>> USt-ID-Nr.: DE120585842 Gabriele Riege >>> Johannes Riege >>> ............................................................. >>> YOU CARE FOR FREIGHT, WE CARE FOR YOU >>> >>> >>> >>> -- >>> Linux-cluster mailing list >>> [email protected] >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> >> >> -- >> Alan A. >> >> -- >> Linux-cluster mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > > -- > Dave Costakos > mailto:[email protected] > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Alan A.
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
