Re: [Linux-cluster] Physical shutdown of one node causes both node to crash in active/passive configuration of 2 node RHEL cluster

Amjad Syed Wed, 10 Sep 2014 02:09:03 -0700

Digimer,

I have applied the changes but looks like it goes into fence loop. That
means when node 1 is running cman and when  reboot node2, it fences node1
and they get into a loop


1) On both nodes acpid is off

 krplporcl001 ~]# service acpid status
  acpid is stopped

 krplporcl002 ~]# service acpid status
acpid is stopped

2)  Changes in cluster .conf <

<clusternode name= "*krplporcl001"*  nodeid="1" >
           <fence>
               <method name  = "1">
                 <device lanplus = "" name="inspuripmi" *delay ="15*"
 action ="reboot"/>
                 </method>
            </fence>
           </clusternode>
            <clusternode name = "*krplporcl002*" nodeid="2">
                 <fence>


3) Bonding uses mode = 1 only

on krplporcl001 :

*DEVICE=bond0*
*IPADDR=192.168.10.10*
*NETMASK=255.255.255.0*
*NETWORK=192.168.10.0*
*BROADCAST=192.168.10.255*
*BOOTPROTO=none*
*Type=Ethernet*
*ONBOOT=yes*
*BONDING_OPTS='miimon=100 mode=1'*

on krplporcl002

*DEVICE=bond0*
*IPADDR=192.168.10.11*
*NETMASK=255.255.255.0*
*NETWORK=192.168.10.0*
*BROADCAST=192.168.10.255*

*BOOTPROTO=none*
*Type=Ethernet*
*ONBOOT=yes*
*BONDING_OPTS='miimon=100 mode=1'*
~
4) I have put one switch as sivaji suggested

As soon as
The logs on klrplporcl001 are as follows
Sep 10 11:47:53 krplporcl001 fenced[5977]: fencing node krplporcl002

The logs on krplporcl002 are as follows :

Sep 10 11:46:48 krplporcl002 fenced[2950]: fencing node krplporcl001

I am not sure why the network is breaking and why both nodes can not
communicate with each other?

Any places to look for logs etc?



On Wed, Sep 10, 2014 at 11:28 AM, Amjad Syed <amjad...@gmail.com> wrote:

>
>
> On Tue, Sep 9, 2014 at 11:53 AM, Digimer <li...@alteeve.ca> wrote:
>
>> On 09/09/14 03:14 AM, Amjad Syed wrote:
>>
>>> <device lanplus = "" name="inspuripmi"  action ="reboot"/>
>>>
>>
>> Something is breaking the network during the shutdown, a fence is being
>> called and both nodes are killing the other, causing a dual fence. So you
>> have a set of problems, I think.
>>
>> First, disable acpid on both nodes.
>>
>> Second, change the quoted line (only) to:
>>
>> <device lanplus = "" name="inspuripmi" delay="15" action ="reboot"/>
>>
>> If I am right, this will mean that 192.168.10.10 will stay up (fence) .11
>>
>> Third, what bonding mode are you using? I would only use mode=1.
>>
>> Forth, please set the node names to match 'uname -n' on both nodes. Be
>> sure the names translate to the IPs you want (via /etc/hosts, ideally).
>>
>> Fifth, as Sivaji suggested, please put switch(es) between the nodes.
>>
>> If it still tries to fence when a node shuts down (watch
>> /var/log/messages and look for 'fencing node ...'), please paste your logs
>> from both nodes.
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Physical shutdown of one node causes both node to crash in active/passive configuration of 2 node RHEL cluster

Reply via email to