Do you have all of your cluster services chkconfig'd on at node2 ? Sounds to me like clvmd might be chkconfig'd off
On Thu, Jun 4, 2009 at 2:54 AM, Jean Diallo < [email protected]> wrote: > Description of problem: In a 2 nodes cluster, after 1 node is fence, any > clvm command hang on the ramaining node. when the fenced node cluster come > back in the cluster, any clvm command also hang, moreover the node do not > activate any clustered vg, and so do not access any shared device. > > > Version-Release number of selected component (if applicable): > redhat 5.2 > update device-mapper-1.02.28-2.el5.x86_64.rpm > lvm2-2.02.40-6.el5.x86_64.rpm > lvm2-cluster-2.02.40-7.el5.x86_64.rpm > > > Steps to Reproduce: > 1.2 nodes cluster , quorum formed with qdisk > 2.cold boot node 2 > 3.node 2 is evicted and fenced, service are taken over by node 1 > 4.node é come back in cluster, quorate, but no clustered vg are up and any > lvm related command hang > 5.At this step every lvm command hang on node 1 > > > Expected results: node 2 should be able to get back the lock on clustered > lvm volume and node 1 should be able to issue any lvm relate command > > Here are my cluster.conf and lvm.conf > <?xml version="1.0"?> > <cluster alias="rome" config_version="53" name="rome"> > <fence_daemon clean_start="0" post_fail_delay="9" > post_join_delay="6"/> > <clusternodes> > <clusternode name="romulus.fr" nodeid="1" votes="1"> > <fence> > <method name="1"> > <device name="ilo172"/> > </method> > </fence> > </clusternode> > <clusternode name="remus.fr" nodeid="2" votes="1"> > <fence> > <method name="1"> > <device name="ilo173"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman expected_votes="3"/> > <totem consensus="4800" join="60" token="21002" > token_retransmits_before_loss_const="20"/> > <fencedevices> > <fencedevice agent="fence_ilo" hostname="X.X.X.X" > login="Administrator" name="ilo172" passwd="X.X.X.X"/> > <fencedevice agent="fence_ilo" hostname="XXXX" > login="Administrator" name="ilo173" passwd="XXXX"/> > </fencedevices> > <rm> > <failoverdomains/> > <resources/> > <vm autostart="1" exclusive="0" migrate="live" > name="alfrescoP64" path="/etc/xen" recovery="relocate"/> > <vm autostart="1" exclusive="0" migrate="live" > name="alfrescoI64" path="/etc/xen" recovery="relocate"/> > <vm autostart="1" exclusive="0" migrate="live" > name="alfrescoS64" path="/etc/xen" recovery="relocate"/> > </rm> > <quorumd interval="3" label="quorum64" min_score="1" tko="30" > votes="1"> > <heuristic interval="2" program="ping -c3 -t2 X.X.X.X" > score="1"/> > </quorumd> > </cluster> > > part of lvm.conf: > # Type 3 uses built-in clustered locking. > locking_type = 3 > > # If using external locking (type 2) and initialisation fails, > # with this set to 1 an attempt will be made to use the built-in > # clustered locking. > # If you are using a customised locking_library you should set this to 0. > fallback_to_clustered_locking = 0 > > # If an attempt to initialise type 2 or type 3 locking failed, perhaps > # because cluster components such as clvmd are not running, with this set > # to 1 an attempt will be made to use local file-based locking (type 1). > # If this succeeds, only commands against local volume groups will > proceed. > # Volume Groups marked as clustered will be ignored. > fallback_to_local_locking = 1 > > # Local non-LV directory that holds file-based locks while commands are > # in progress. A directory like /tmp that may get wiped on reboot is OK. > locking_dir = "/var/lock/lvm" > > # Other entries can go here to allow you to load shared libraries > # e.g. if support for LVM1 metadata was compiled as a shared library use > # format_libraries = "liblvm2format1.so" > # Full pathnames can be given. > > # Search this directory first for shared libraries. > # library_dir = "/lib" > > # The external locking library to load if locking_type is set to 2. > # locking_library = "liblvm2clusterlock.so" > > > part of lvm log on second node : > > vgchange.c:165 Activated logical volumes in volume group "VolGroup00" > vgchange.c:172 7 logical volume(s) in volume group "VolGroup00" now > active > cache/lvmcache.c:1220 Wiping internal VG cache > commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:17:29 > 2009 > commands/toolcontext.c:209 Set umask to 0077 > locking/cluster_locking.c:83 connect() failed on local socket: Connexion > refusée > locking/locking.c:259 WARNING: Falling back to local file-based locking. > locking/locking.c:261 Volume Groups with the clustered attribute will be > inaccessible. > toollib.c:578 Finding all volume groups > toollib.c:491 Finding volume group "VGhomealfrescoS64" > metadata/metadata.c:2379 Skipping clustered volume group > VGhomealfrescoS64 > toollib.c:491 Finding volume group "VGhomealfS64" > metadata/metadata.c:2379 Skipping clustered volume group VGhomealfS64 > toollib.c:491 Finding volume group "VGvmalfrescoS64" > metadata/metadata.c:2379 Skipping clustered volume group VGvmalfrescoS64 > toollib.c:491 Finding volume group "VGvmalfrescoI64" > metadata/metadata.c:2379 Skipping clustered volume group VGvmalfrescoI64 > toollib.c:491 Finding volume group "VGvmalfrescoP64" > metadata/metadata.c:2379 Skipping clustered volume group VGvmalfrescoP64 > toollib.c:491 Finding volume group "VolGroup00" > libdm-report.c:981 VolGroup00 > cache/lvmcache.c:1220 Wiping internal VG cache > commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:17:29 > 2009 > commands/toolcontext.c:209 Set umask to 0077 > locking/cluster_locking.c:83 connect() failed on local socket: Connexion > refusée > locking/locking.c:259 WARNING: Falling back to local file-based locking. > locking/locking.c:261 Volume Groups with the clustered attribute will be > inaccessible. > toollib.c:542 Using volume group(s) on command line > toollib.c:491 Finding volume group "VolGroup00" > vgchange.c:117 7 logical volume(s) in volume group "VolGroup00" monitored > cache/lvmcache.c:1220 Wiping internal VG cache > commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:20:45 > 2009 > commands/toolcontext.c:209 Set umask to 0077 > toollib.c:331 Finding all logical volumes > commands/toolcontext.c:188 Logging initialised at Wed Jun 3 15:20:50 > 2009 > commands/toolcontext.c:209 Set umask to 0077 > toollib.c:578 Finding all volume groups > > > group_tool on node 1 > type level name id state fence 0 > default 00010001 none [1 2] > dlm 1 clvmd 00010002 none [1 2] > dlm 1 rgmanager 00020002 none [1] > > > group_tool on node 2 > [r...@remus ~]# group_tool > type level name id state fence 0 > default 00010001 none [1 2] > dlm 1 clvmd 00010002 none [1 2] > > Additional info: > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
