On Wed, Jul 23, 2008 at 06:56:40PM -0300, Tiago Cruz wrote:
> Hello,
> 
> I have one machine (hotsite-bsb-la-1) exporting GNBD to two machines 
> (hotsite-bsb-la-2 and "-3")
> 
> The cluster with RHEL 5.2 x86_64 and GFS was working very well, util I reboot 
> the hotsite-bsb-la-2:
> 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] CLM CONFIGURATION 
> CHANGE 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] New Configuration: 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ]       r(0) 
> ip(10.65.13.30)  
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ]       r(0) 
> ip(10.65.13.33)  
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Left: 
> Jul 23 18:56:38 hotsite-bsb-la-1 kernel: dlm: closing connection to node 2
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ]       r(0) 
> ip(10.65.13.31)  
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Joined: 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] CLM CONFIGURATION 
> CHANGE 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] New Configuration: 
> Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: hotsite-bsb-la-2.com not a 
> cluster member after 0 sec post_fail_delay
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ]       r(0) 
> ip(10.65.13.30)  
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ]       r(0) 
> ip(10.65.13.33)  
> Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fencing node 
> "hotsite-bsb-la-2.com"
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Left: 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Joined: 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [SYNC ] This node is within 
> the primary component and will provide service. 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [TOTEM] entering OPERATIONAL 
> state. 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] got nodejoin message 
> 10.65.13.30 
> Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fence "hotsite-bsb-la-2.com" 
> failed
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] got nodejoin message 
> 10.65.13.33 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG  ] got joinlist message 
> from node 1 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG  ] got joinlist message 
> from node 3 
> Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fencing node 
> "hotsite-bsb-la-2.com.br"
> Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fence 
> "hotsite-bsb-la-2.com.br" failed
> Jul 23 19:00:57 hotsite-bsb-la-1 last message repeated 50 times
> 
> Why fence was failing? Follow the cluster.conf:
> 
> <?xml version="1.0"?>
> <cluster alias="hotsites" config_version="18" name="hotsites">
>       <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>       <clusternodes>
>               <clusternode name="hotsite-bsb-la-1.com" nodeid="1" votes="1">
>               <fence/>
>               </clusternode>
>               <clusternode name="hotsite-bsb-la-2.com" nodeid="2" votes="1">
>               <fence>
>                  <method name="single">
>                       <device name="gnbd" nodename="hotsite-bsb-la-2.com"/>
>                  </method>
>               </fence>
>               </clusternode>
>               <clusternode name="hotsite-bsb-la-3.com" nodeid="3" votes="1">
>               <fence>
>                  <method name="single">
>                       <device name="gnbd" nodename="hotsite-bsb-la-3.com"/>
>                  </method>
>               </fence>
>               </clusternode>
>       </clusternodes>
>       <cman/>
>       <fencedevices>
>               <fencedevice agent="fence_gnbd" name="hotsite" 
> servers="hotsite-1.com"/>
>       </fencedevices>
>       <rm>
>               <failoverdomains/>
>               <resources>
>                       <clusterfs device="/dev/gnbd/hotsite" force_unmount="1" 
> fsid="5666" fstype="gfs" mountpoint="/data" name="data" self_fence="1"/>
>               </resources>
>       </rm>
>       <totem consensus="4800" join="60" token="10000" 
> token_retransmits_before_loss_const="20"/>
> </cluster>
>

There are two problem's with your cluster.conf file that may be causing
this.

1. In the clusternode <device> line for fencing devices, "name" must be
the same as "name" in the appropriate <fencedevice> line.

2. In the <fencedevice> line, the "servers" must be listed using the "name"
in <clusternode> line.

So, for your configuration, the <fencedevice> line should be

<fencedevice agent="fence_gnbd" name="gnbd" servers="hotsite-bsb-la-1.com"/>

See if this helps.

-Ben
 
> 
> 
> # cman_tool status
> Version: 6.1.0
> Config Version: 18
> Cluster Name: hotsites
> Cluster Id: 27589
> Cluster Member: Yes
> Cluster Generation: 184
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 3
> Total votes: 2
> Quorum: 2  
> Active subsystems: 8
> Flags: Dirty 
> Ports Bound: 0 177  
> Node name: hotsite-bsb-la-1.com
> Node ID: 1
> Multicast addresses: 239.192.107.49 
> Node addresses: 10.65.13.30 
> 
> 
> Thanks
> 
> -- 
> Tiago Cruz
> http://everlinux.com
> Linux User #282636
> 
> 
> --
> Linux-cluster mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to