Hi, It looks to me as if the fs is corrupt in some manner. Try unmounting on all nodes and running fsck on one node on the filesystem. Make sure you save the output of fsck in case that is useful for future debugging and make sure you have a backup of the data in question first.
Its tricky to say exactly what might have gone wrong (the fsck output might give a clue) but you will certainly need fsck to fix whatever the problem is, Steve. On Tue, 2010-07-06 at 13:22 +1200, Abraham Alawi wrote: > The system was running well for a while but lately we had a flaky disk in the > RAID array which we replaced with a healthy one but suddenly the CLVM/GFS > became unusable, we can mount GFS but while listing it recursively 'ls -R' it > hangs with Input/output error, can't even access the c/LVM LUN rawly using > 'dd' BUT we still can access the LVM PV devices using 'dd'. Reconfiguring the > LVM volume as a local one and accessing it exclusively from one node doesn't > make a difference. > > RHEL5: 2.6.18-164.11.1.el5 > # modinfo gfs > filename: /lib/modules/2.6.18-164.11.1.el5/weak-updates/gfs/gfs.ko > license: GPL > author: Red Hat, Inc. > description: Global File System 0.1.34-2.el5 > srcversion: 3B1BAC4069F1A4B556A958A > depends: dlm > vermagic: 2.6.18-159.el5 SMP mod_unload gcc-4.1 > > # uname -r > 2.6.18-164.11.1.el5 > > # modinfo /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/block/aoe/aoe.ko > filename: > /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/block/aoe/aoe.ko > description: AoE block/char driver for 2.6.2 and newer 2.6 kernels > author: Sam Hopkins <[email protected]> > license: GPL > srcversion: 42BF122979AC807F2BB50E6 > depends: > vermagic: 2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1 > parm: aoe_iflist:aoe_iflist=dev1[,dev2...] > (string) > parm: version:aoe module version 74 > (string) > parm: aoe_dyndevs:Use dynamic minor numbers for devices. (int) > parm: aoe_deadsecs:After aoe_deadsecs seconds, give up and fail > dev. (int) > parm: aoe_maxout:Only aoe_maxout outstanding packets for every MAC > on eX.Y. (int) > parm: aoe_maxsectors:When nonzero, set the maximum number of > sectors per I/O request in new devices. (int) > > # modinfo dlm > filename: /lib/modules/2.6.18-164.11.1.el5/kernel/fs/dlm/dlm.ko > license: GPL > author: Red Hat, Inc. > description: Distributed Lock Manager > srcversion: E768995007648CA8DB078AE > depends: configfs > vermagic: 2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1 > module_sig: > 883f3504b56fe19c59c69348c13cf1f1126a509f6ddaee3965ee8b5fcd04163669647a889a9801e09f722187d1de068c0d52cd2b99bc3d475cb6ca1a0 > > > > Herein what the kernel spits out: > > Jul 6 11:27:36 kiwiland kernel: GFS 0.1.34-2.el5 (built Sep 9 2009 > 06:54:42) installed > Jul 6 11:27:36 kiwiland kernel: Lock_DLM (built Sep 9 2009 06:54:38) > installed > Jul 6 11:27:36 kiwiland kernel: Lock_Nolock (built Sep 9 2009 06:54:37) > installed > Jul 6 11:27:36 kiwiland kernel: Trying to join cluster "lock_dlm", > "FSC:files" > Jul 6 11:27:36 kiwiland kernel: Joined cluster. Now mounting FS... > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Trying to > acquire journal lock... > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Looking at > journal... > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Acquiring the > transaction lock... > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Replaying > journal... > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Replayed 0 of > 11 blocks > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: replays = 0, > skips = 4, sames = 7 > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Journal > replayed in 1s > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Done > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Trying to > acquire journal lock... > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Looking at > journal... > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Done > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Scanning for log > elements... > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Found 2 unlinked > inodes > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Found quota changes > for 2 IDs > Jul 6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Done > Jul 6 11:27:36 kiwiland kernel: Trying to join cluster "lock_dlm", > "FSC:webcluster" > Jul 6 11:27:36 kiwiland kernel: Joined cluster. Now mounting FS... > Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Trying to > acquire journal lock... > Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Looking > at journal... > Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Done > Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Scanning for log > elements... > Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Found 0 unlinked > inodes > Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Found quota > changes for 0 IDs > Jul 6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Done > Jul 6 11:27:37 kiwiland kernel: Installing knfsd (copyright (C) 1996 > [email protected]). > Jul 6 11:27:39 kiwiland kernel: NFSD: Using /var/lib/nfs/v4recovery as the > NFSv4 state recovery directory > Jul 6 11:27:39 kiwiland kernel: NFSD: starting 90-second grace period > Jul 6 11:32:21 kiwiland kernel: dlm: closing connection to node 1 > Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Trying to > acquire journal lock... > Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: fatal: invalid > metadata block > Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: bh = 1432543247 > (magic) > Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: function = > gfs_rgrp_read > Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: file = > /builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/rgrp.c, line = 830 > Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: time = 1278372781 > Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: about to withdraw > from the cluster > Jul 6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: telling LM to withdraw > Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Looking > at journal... > Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Acquiring > the transaction lock... > Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Replaying > journal... > Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Replayed > 0 of 0 blocks > Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: replays = > 0, skips = 0, sames = 0 > Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Journal > replayed in 1s > Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Done > Jul 6 11:33:02 kiwiland kernel: GFS: fsid=FSC:files.0: withdrawn > Jul 6 11:33:02 kiwiland kernel: > Jul 6 11:33:02 kiwiland kernel: Call Trace: > Jul 6 11:33:02 kiwiland kernel: [<ffffffff88805018>] > :gfs:gfs_lm_withdraw+0xc4/0xd3 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff80063a36>] __wait_on_bit+0x60/0x6e > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8001538b>] sync_buffer+0x0/0x3f > Jul 6 11:33:02 kiwiland kernel: [<ffffffff80063ab0>] > out_of_line_wait_on_bit+0x6c/0x78 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff800a00e5>] > wake_bit_function+0x0/0x23 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8881cc97>] > :gfs:gfs_meta_check_ii+0x32/0x3e > Jul 6 11:33:02 kiwiland kernel: [<ffffffff88819439>] > :gfs:gfs_rgrp_read+0x139/0x225 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff887fb8e8>] > :gfs:glock_wait_internal+0x229/0x2c3 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff887fbd17>] > :gfs:gfs_glock_nq+0x395/0x3d6 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff887fbd6e>] > :gfs:gfs_glock_nq_init+0x16/0x2a > Jul 6 11:33:02 kiwiland kernel: [<ffffffff88817466>] > :gfs:gfs_rgrp_lvb_init+0x1e/0x3f > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8881a46f>] > :gfs:gfs_stat_gfs+0x213/0x273 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8881353d>] > :gfs:gfs_statfs+0x67/0xea > Jul 6 11:33:02 kiwiland kernel: [<ffffffff800deba3>] vfs_statfs+0x63/0x7f > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886d2ce>] > :nfsd:nfsd_statfs+0x28/0x38 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff888745f8>] > :nfsd:nfsd3_proc_fsstat+0x3f/0x54 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886a1db>] > :nfsd:nfsd_dispatch+0xd8/0x1d6 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff886e0529>] > :sunrpc:svc_process+0x454/0x71b > Jul 6 11:33:02 kiwiland kernel: [<ffffffff80064644>] __down_read+0x12/0x92 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886a746>] :nfsd:nfsd+0x1a5/0x2cb > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb > Jul 6 11:33:02 kiwiland kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 > Jul 6 11:33:02 kiwiland kernel: > > > Another kernel spit out: > Jul 5 02:01:19 Hercules kernel: GFS: fsid=FSC:files.0: fast statfs start > time = 1278252079 > Jul 5 03:01:16 Hercules kernel: GFS: fsid=FSC:files.0: fast statfs start > time = 1278255676 > Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: fatal: invalid > metadata block > Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: bh = 86700288 > (magic) > Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: function = > gfs_get_meta_buffer > Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: file = > /builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/dio.c, line = 1225 > Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: time = 1278255737 > Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: about to withdraw > from the cluster > Jul 5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: telling LM to withdraw > Jul 5 03:02:21 Hercules kernel: GFS: fsid=FSC:files.0: withdrawn > Jul 5 03:02:21 Hercules kernel: > Jul 5 03:02:21 Hercules kernel: Call Trace: > Jul 5 03:02:21 Hercules kernel: [<ffffffff8880a018>] > :gfs:gfs_lm_withdraw+0xc4/0xd3 > Jul 5 03:02:21 Hercules kernel: [<ffffffff8001538b>] sync_buffer+0x0/0x3f > Jul 5 03:02:21 Hercules kernel: [<ffffffff80063ab0>] > out_of_line_wait_on_bit+0x6c/0x78 > Jul 5 03:02:21 Hercules kernel: [<ffffffff800a00e5>] > wake_bit_function+0x0/0x23 > Jul 5 03:02:21 Hercules kernel: [<ffffffff88821c97>] > :gfs:gfs_meta_check_ii+0x32/0x3e > Jul 5 03:02:21 Hercules kernel: [<ffffffff887f7717>] > :gfs:gfs_get_meta_buffer+0x1d1/0x247 > Jul 5 03:02:21 Hercules kernel: [<ffffffff88804193>] > :gfs:gfs_copyin_dinode+0x1d/0x12f > Jul 5 03:02:21 Hercules kernel: [<ffffffff88800d6e>] > :gfs:gfs_glock_nq_init+0x16/0x2a > Jul 5 03:02:21 Hercules kernel: [<ffffffff888043e3>] > :gfs:inode_create+0x13e/0x1df > Jul 5 03:02:21 Hercules kernel: [<ffffffff88804a5d>] > :gfs:gfs_inode_get+0x9d/0xba > Jul 5 03:02:21 Hercules kernel: [<ffffffff888053bb>] > :gfs:gfs_lookupi+0x33d/0x3df > Jul 5 03:02:21 Hercules kernel: [<ffffffff887fce57>] :gfs:ea_find_i+0x0/0x6b > Jul 5 03:02:21 Hercules kernel: [<ffffffff888172af>] > :gfs:gfs_lookup+0x363/0x41a > Jul 5 03:02:21 Hercules kernel: [<ffffffff80025426>] igrab+0x25/0x34 > Jul 5 03:02:21 Hercules kernel: [<ffffffff888055a0>] > :gfs:gfs_iget+0x3d/0x1f1 > Jul 5 03:02:21 Hercules kernel: [<ffffffff88801224>] > :gfs:gfs_glock_dq+0x13c/0x14b > Jul 5 03:02:21 Hercules kernel: [<ffffffff8000cf01>] do_lookup+0xe5/0x1e6 > Jul 5 03:02:21 Hercules kernel: [<ffffffff8000a22b>] > __link_path_walk+0xa01/0xf42 > Jul 5 03:02:21 Hercules kernel: [<ffffffff8000e9cc>] > link_path_walk+0x42/0xb2 > Jul 5 03:02:21 Hercules kernel: [<ffffffff8000cc9c>] > do_path_lookup+0x275/0x2f1 > Jul 5 03:02:21 Hercules kernel: [<ffffffff80012752>] getname+0x15b/0x1c2 > Jul 5 03:02:21 Hercules kernel: [<ffffffff800236ba>] > __user_walk_fd+0x37/0x4c > Jul 5 03:02:21 Hercules kernel: [<ffffffff8003f235>] vfs_lstat_fd+0x18/0x47 > Jul 5 03:02:21 Hercules kernel: [<ffffffff8002a95a>] sys_newlstat+0x19/0x31 > Jul 5 03:02:21 Hercules kernel: [<ffffffff8005dde9>] error_exit+0x0/0x84 > Jul 5 03:02:21 Hercules kernel: [<ffffffff8005d116>] system_call+0x7e/0x83 > > > Thanks in advance, > > -- Abraham > > '''''''''''''''''''''''''''''''''''''''''''''''''''''' > Abraham Alawi > > Unix/Linux Systems Administrator > Science IT > University of Auckland > e: [email protected] > p: +64-9-373 7599, ext#: 87572 > > '''''''''''''''''''''''''''''''''''''''''''''''''''''' > > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
