Confirmed. Same here. Seems like a bug to me still though. I would hope we have to ability to do rolling upgrades on openais in our RHEL clusters.
2009/1/28 Alan A <[email protected]> > Rolling back to previous openais package allowed me to restart cman. From > openais-0.80.3-22el5 to > openais-0.80.3-15.el5. > > > 2009/1/28 Dave Costakos <[email protected]> > > Like you, I've run into this same issue. I have 2 clusters that I'm trying >> to update in our lab. On one, I only updated the cman and rgmanager >> packages: this update was successful. On another I did a full update to 5.3 >> and ran into what appears to be this same problem. II've noticed that >> manually attempting to start cman via 'cman_tool -d join' prints out this >> message right before cman fails. >> >> aisexec: ckpt.c:3961: >> message_handler_req_exec_ckpt_sync_checkpoint_refcount:Assertion `checkpoint >> != ((void *)0)' failed >> >> >> >> I suspect an openais issue, would someone be able to confirm that? >> >> Also, II'm going to try downgrading openais back to the version from RHEL >> 5.2 to see if that fixes it (though I won't get to that until the end of >> today). If that works, I'll report back. >> >> >> >> 2009/1/27 Alan A <[email protected]> >> >> I just opened RHEL case number 1890184 regarding the same issue. First >>> Kernel would not start due to the HP ILO driver conflict, but at the same >>> time CMAN broke, and fencing fails. I rolled back cman rpm to the previous >>> version but problem persists. Something else changed to affect CMAN not >>> starting again. >>> >>> 2009/1/27 Gunther Schlegel <[email protected]> >>> >>>> Hello, >>>> >>>> I updated one node from 5.2 to 5.3 using yum update and now cman does >>>> not start up anymore -- looks like ccsd has some problems: >>>> >>>> [r...@motel6 /]# /sbin/ccsd -4 -n >>>> Starting ccsd 2.0.98: >>>> Built: Dec 3 2008 16:32:30 >>>> Copyright (C) Red Hat, Inc. 2004 All rights reserved. >>>> IP Protocol:: IPv4 only >>>> No Daemon:: SET >>>> >>>> Cluster is not quorate. Refusing connection. >>>> Error while processing connect: Connection refused >>>> Cluster is not quorate. Refusing connection. >>>> Error while processing connect: Connection refused >>>> Unable to connect to cluster infrastructure after 30 seconds. >>>> Unable to connect to cluster infrastructure after 60 seconds. >>>> >>>> >>>> When starting ccsd using /etc/init.d/cman it reports all three nodes to >>>> be on cluster.conf version 78, so I guess it is not a network connectivity >>>> problem. >>>> >>>> The other two nodes (still on 5.2z) of the cluster are up and running >>>> with quorum. Openais is talking to those 2 other nodes and it looks fine to >>>> me: >>>> >>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] Members Joined: >>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] #011r(0) ip(10.11.5.22) >>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] #011r(0) ip(10.11.5.23) >>>> Jan 27 21:05:26 motel6 openais[1278]: [SYNC ] This node is within the >>>> primary component and will provide service. >>>> Jan 27 21:05:26 motel6 openais[1278]: [TOTEM] entering OPERATIONAL >>>> state. >>>> Jan 27 21:05:26 motel6 openais[1278]: [CMAN ] quorum regained, resuming >>>> activity >>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] got nodejoin message >>>> 10.11.5.21 >>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] got nodejoin message >>>> 10.11.5.22 >>>> Jan 27 21:05:26 motel6 openais[1278]: [CLM ] got nodejoin message >>>> 10.11.5.23 >>>> >>>> >>>> I am a bit lost... >>>> >>>> cluster.conf: >>>> [r...@motel6 init.d]# cat /etc/cluster/cluster.conf >>>> <?xml version="1.0"?> >>>> <cluster alias="RSIXENCluster2" config_version="87" >>>> name="RSIXENCluster2"> >>>> <fence_daemon clean_start="0" post_fail_delay="0" >>>> post_join_delay="3"/> >>>> <clusternodes> >>>> <clusternode name="concorde.riege.de" nodeid="1" >>>> votes="1"> >>>> <fence> >>>> <method name="1"> >>>> <device name="Concorde_IPMI"/> >>>> </method> >>>> </fence> >>>> </clusternode> >>>> <clusternode name="motel6.riege.de" nodeid="2" >>>> votes="1"> >>>> <fence> >>>> <method name="1"> >>>> <device name="Motel6_IPMI"/> >>>> </method> >>>> </fence> >>>> </clusternode> >>>> <clusternode name="mercure.riege.de" nodeid="3" >>>> votes="1"> >>>> <fence> >>>> <method name="1"> >>>> <device name="Mercure_IPMI"/> >>>> </method> >>>> </fence> >>>> </clusternode> >>>> </clusternodes> >>>> <fencedevices> >>>> <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.132" >>>> login="root" name="Concorde_IPMI" passwd="XXX"/> >>>> <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.131" >>>> login="root" name="Motel6_IPMI" passwd="xxx"/> >>>> <fencedevice agent="fence_ipmilan" ipaddr="10.11.5.133" >>>> login="root" name="Mercure_IPMI" passwd="XXX"/> >>>> </fencedevices> >>>> <rm> >>>> <failoverdomains> >>>> <failoverdomain name="Earth" nofailback="1" >>>> ordered="1" restricted="1"> >>>> <failoverdomainnode name=" >>>> concorde.riege.de" priority="1"/> >>>> <failoverdomainnode name=" >>>> motel6.riege.de" priority="1"/> >>>> <failoverdomainnode name=" >>>> mercure.riege.de" priority="1"/> >>>> </failoverdomain> >>>> <failoverdomain name="Europe" nofailback="0" >>>> ordered="1" restricted="0"> >>>> <failoverdomainnode name=" >>>> concorde.riege.de" priority="2"/> >>>> </failoverdomain> >>>> <failoverdomain name="North America" >>>> nofailback="0" ordered="1" restricted="0"> >>>> <failoverdomainnode name=" >>>> motel6.riege.de" priority="2"/> >>>> </failoverdomain> >>>> <failoverdomain name="Africa" nofailback="0" >>>> ordered="1" restricted="0"> >>>> <failoverdomainnode name=" >>>> mercure.riege.de" priority="1"/> >>>> </failoverdomain> >>>> </failoverdomains> >>>> <resources/> >>>> <vm autostart="1" domain="Africa" exclusive="0" >>>> migrate="live" name="vm64.test.riege.de_64" path="/etc/xen" >>>> recovery="restart"/> >>>> <vm autostart="1" domain="North America" exclusive="0" >>>> migrate="pause" name="rt.test.riege.de_32" path="/etc/xen" >>>> recovery="restart"/> >>>> <vm autostart="1" domain="Africa" exclusive="0" >>>> migrate="pause" name="poincare.riege.de_32" path="/etc/xen" >>>> recovery="restart"/> >>>> <vm autostart="1" domain="North America" exclusive="0" >>>> migrate="live" name="jboss.dev.riege.de_64" path="/etc/xen" >>>> recovery="relocate"/> >>>> <vm autostart="1" domain="Africa" exclusive="0" >>>> migrate="live" name="master.cc3.dev.riege.de_64" path="/etc/xen" >>>> recovery="relocate"/> >>>> <vm autostart="1" domain="Europe" exclusive="0" >>>> migrate="pause" name="test.alphatrans.scope.riege.com_32" path="/etc/xen" >>>> recovery="relocate"/> >>>> <vm autostart="1" domain="North America" exclusive="0" >>>> migrate="live" name="slave.cc3.dev.riege.de_64" path="/etc/xen" >>>> recovery="restart"/> >>>> <vm autostart="1" domain="North America" exclusive="0" >>>> migrate="live" name="webmail.riege.com_64" path="/etc/xen" >>>> recovery="relocate"/> >>>> <vm autostart="1" domain="Europe" exclusive="0" >>>> migrate="live" name="live.rsi.scope.riege.com_64" path="/etc/xen" >>>> recovery="relocate"/> >>>> <vm autostart="1" domain="Europe" exclusive="0" >>>> migrate="pause" name="qa-16.rsi.scope.riege.com_32" path="/etc/xen" >>>> recovery="relocate"/> >>>> <vm autostart="1" domain="Africa" exclusive="0" >>>> migrate="pause" name="qa-18.rsi.scope.riege.com_32" path="/etc/xen" >>>> recovery="relocate"/> >>>> <vm autostart="1" domain="Africa" exclusive="0" >>>> migrate="pause" name="vm32.test.riege.de_32" path="/etc/xen" >>>> recovery="restart"/> >>>> <vm autostart="1" domain="Europe" exclusive="0" >>>> migrate="pause" name="qa-head.rsi.scope.riege.com_32" path="/etc/xen" >>>> recovery="restart"/> >>>> <vm autostart="1" domain="North America" exclusive="0" >>>> migrate="live" name="mq.dev.riege.de_64" path="/etc/xen" >>>> recovery="relocate"/> >>>> <vm autostart="1" domain="Europe" exclusive="0" >>>> migrate="live" name="archive.dev.riege.de_64" path="/etc/xen" >>>> recovery="restart"/> >>>> </rm> >>>> <cman quorum_dev_poll="50000"/> >>>> <totem consensus="4800" join="60" token="60000" >>>> token_retransmits_before_loss_const="20"/> >>>> <quorumd device="/dev/mapper/Quorum_Partition" interval="3" >>>> min_score="1" tko="10" votes="2"/> >>>> </cluster> >>>> >>>> best regards, Gunther >>>> >>>> -- >>>> ............................................................. >>>> Riege Software International GmbH Fon: +49 (2159) 9148 0 >>>> Mollsfeld 10 Fax: +49 (2159) 9148 11 >>>> 40670 Meerbusch Web: www.riege.com >>>> Germany E-Mail: [email protected] >>>> --- --- >>>> Handelsregister: Managing Directors: >>>> Amtsgericht Neuss HRB-NR 4207 Christian Riege >>>> USt-ID-Nr.: DE120585842 Gabriele Riege >>>> Johannes Riege >>>> ............................................................. >>>> YOU CARE FOR FREIGHT, WE CARE FOR YOU >>>> >>>> >>>> >>>> -- >>>> Linux-cluster mailing list >>>> [email protected] >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> >>> >>> -- >>> Alan A. >>> >>> -- >>> Linux-cluster mailing list >>> [email protected] >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> >> >> -- >> Dave Costakos >> mailto:[email protected] >> >> -- >> Linux-cluster mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > > -- > Alan A. > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Dave Costakos mailto:[email protected]
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
