This is worse than I tought. The entire cluster is hanging upon restart command issued from the Conga - lucy box. I tried bringing gfs service down on node2 (lucy) with the: service gfs stop (we are not running rgmanager), and I got: FATAL: Module gfs is in use.
On node3: service gfs status: Configured GFS mountpoints: /lvm_test1 /lvm_test2 Active GFS mountpoints: /lvm_test1 /lvm_test2 service gfs stop: Unmounting GFS filesystems: (hangs) node2 - .175 node3 - .78 node4 - .79 All nodes are configured on the same segment. These are the messages from the node3 once from the point I tried to restart the cluster: Sep 26 09:00:38 dev03 openais[8692]: [TOTEM] entering GATHER state from 12. Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering GATHER state from 0. Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Creating commit token because I am the rep. Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Saving state aru 1e1 high seq received 1e1 Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Storing new sequence id for ring 454 Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering COMMIT state. Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering RECOVERY state. Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] position [0] member xxx.xxx.xxx.78: Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] previous ring seq 1104 rep xxx.xxx.xxx.78 Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] aru 1e1 high delivered 1e1 received flag 1 Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] position [1] member xxx.xxx.xxx.175: Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] previous ring seq 1104 rep xxx.xxx.xxx.78 Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] aru 1e1 high delivered 1e1 received flag 1 Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Did not need to originate any messages in recovery. Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] Sending initial ORF token Sep 26 09:00:43 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE Sep 26 09:00:43 dev03 kernel: dlm: closing connection to node 2 Sep 26 09:00:43 dev03 openais[8692]: [CLM ] New Configuration: Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78) Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175) Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Left: Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.79) Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Joined: Sep 26 09:00:43 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE Sep 26 09:00:43 dev03 openais[8692]: [CLM ] New Configuration: Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78) Sep 26 09:00:43 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175) Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Left: Sep 26 09:00:43 dev03 openais[8692]: [CLM ] Members Joined: Sep 26 09:00:43 dev03 openais[8692]: [SYNC ] This node is within the primary component and will provide service. Sep 26 09:00:43 dev03 openais[8692]: [TOTEM] entering OPERATIONAL state. Sep 26 09:00:43 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.78 Sep 26 09:00:43 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.175 Sep 26 09:00:43 dev03 openais[8692]: [CPG ] got joinlist message from node 3 Sep 26 09:00:43 dev03 fenced[8710]: fencing deferred to fenmrdev02.maritz.com Sep 26 09:00:43 dev03 openais[8692]: [CPG ] got joinlist message from node 1 Sep 26 09:00:45 dev03 kernel: GFS: fsid=test1_cluster:gfs_fs1.2: jid=1: Trying to acquire journal lock... Sep 26 09:00:45 dev03 kernel: GFS: fsid=test1_cluster:gfs_fs1.2: jid=1: Busy Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering GATHER state from 11. Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Creating commit token because I am the rep. Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Saving state aru 31 high seq received 31 Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Storing new sequence id for ring 458 Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering COMMIT state. Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering RECOVERY state. Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [0] member xxx.xxx.xxx.78: Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.78 Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 31 high delivered 31 received flag 1 Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [1] member xxx.xxx.xxx.79: Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.79 Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 9 high delivered 9 received flag 1 Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] position [2] member xxx.xxx.xxx.175: Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] previous ring seq 1108 rep xxx.xxx.xxx.78 Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] aru 31 high delivered 31 received flag 1 Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Did not need to originate any messages in recovery. Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] Sending initial ORF token Sep 26 09:02:37 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE Sep 26 09:02:37 dev03 openais[8692]: [CLM ] New Configuration: Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78) Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175) Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Left: Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Joined: Sep 26 09:02:37 dev03 openais[8692]: [CLM ] CLM CONFIGURATION CHANGE Sep 26 09:02:37 dev03 openais[8692]: [CLM ] New Configuration: Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.78) Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.79) Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.175) Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Left: Sep 26 09:02:37 dev03 openais[8692]: [CLM ] Members Joined: Sep 26 09:02:37 dev03 openais[8692]: [CLM ] r(0) ip(xxx.xxx.xxx.79) Sep 26 09:02:37 dev03 openais[8692]: [SYNC ] This node is within the primary component and will provide service. Sep 26 09:02:37 dev03 openais[8692]: [TOTEM] entering OPERATIONAL state. Sep 26 09:02:37 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.78 Sep 26 09:02:37 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.79 Sep 26 09:02:37 dev03 openais[8692]: [CLM ] got nodejoin message xxx.xxx.xxx.175 Sep 26 09:02:37 dev03 openais[8692]: [CPG ] got joinlist message from node 3 Sep 26 09:02:37 dev03 openais[8692]: [CPG ] got joinlist message from node 1 Sep 26 09:02:43 dev03 kernel: dlm: connecting to 2 ---------- Forwarded message ---------- From: Alan A <[EMAIL PROTECTED]> Date: Thu, Sep 25, 2008 at 2:04 PM Subject: GFS volume hangs on 3 nodes after gfs_grow To: linux clustering <linux-cluster@redhat.com> Hi all! I have 3 node test cluster utilizing SCSI fencing and GFS. I have made 2 GFS Logical Volumes - lvm1 and lvm2, both utilizing 5GB on 10GB disks. Testing the command line tools I did lvextend -L +1G /devicename to bring lvm2 to 6GB. This went fine without any problems. Then I issued command gfs_grow /mountpoint and the volume became inaccessible. Any command trying to access the volume hangs, and umount returns: /sbin/umount.gfs: /lvm2: device is busy. Few questions - Since I have two volumes on this cluster and the lvm1 works just fine, would there be any suggestions to unmounting lvm2 in order to try and fix it? Is gfs_grow - bug free or not (use/do not use)? Is there any other way besides restarting the cluster/ nodes to get lvm2 back in operational state? -- Alan A. -- Alan A.
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster