Here is my cluster.conf

#########################################

<?xml version="1.0"?>
<cluster alias="myiacon" config_version="16" name="myiacon">
        <fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="60"/>
        <clusternodes>
                <clusternode name="ratchet.local" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="ratchet_ipmi"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="skydive.local" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="skydive_ipmi"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="wheeljack.local" nodeid="3" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="wheeljack_ipmi"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" ipaddr="192.168.1.100"
login="root" name="ratchet_ipmi" passwd="xxxxx"/>
                <fencedevice agent="fence_ipmilan" ipaddr="192.168.1.102"
login="root" name="skydive_ipmi" passwd="xxxxx"/>
                <fencedevice agent="fence_ipmilan" ipaddr="192.168.1.101"
login="root" name="wheeljack_ipmi" passwd="xxxxxx"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>

#############################################

And here is one of the errors I just started getting:

Sep 29 08:10:06 wheeljack openais[5453]: [MAIN ] Killing node ratchet.local
beca    use it has rejoined the cluster with existing state

But half the time, servers just complain that they cant reconnect to the
cluster.


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mark Chaney
Sent: Monday, September 29, 2008 3:07 AM
To: [email protected]
Subject: [Linux-cluster] proper cluster crash procedures?

I have a 3 node cluster that has shared storage using iscsi san, hence I am
using GFS. Anyway, I had it crash for whatever reason, not sure if something
was rebooted incorrectly or what, but now I have been spending the past 2
hours trying to get the cluster back up. I would think that sampling
rebooting all the nodes would work, but heck, that hasn't. What should I be
doing? Should I just start up one at a time? BTW, I am using ipmi for
fencing if that makes a difference. I can post my cluster.conf if that's
helpful, but I would think there would be general techniques available.

Thanks,
Mark




--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to