RE: [Linux-cluster] 'ls' makes GFS2 to withdraw

Theophanis Kontogiannis Mon, 16 Mar 2009 08:28:45 -0700

Hello again,


A few minutes after my initial post the gfs2 file system failed also on the
first node.

 

..........

GFS2: fsid=tweety:gfs2-00.1: fatal: invalid metadata block

GFS2: fsid=tweety:gfs2-00.1:   bh = 522538 (magic number)

GFS2: fsid=tweety:gfs2-00.1:   function = gfs2_meta_indirect_buffer, file =
fs/gfs2/meta_io.c, line = 332

GFS2: fsid=tweety:gfs2-00.1: about to withdraw this file system

GFS2: fsid=tweety:gfs2-00.1: telling LM to withdraw

GFS2: fsid=tweety:gfs2-00.1: withdrawn

 

Call Trace:

 [<ffffffff885c3146>] :gfs2:gfs2_lm_withdraw+0xc1/0xd0

 [<ffffffff800639de>] __wait_on_bit+0x60/0x6e

 [<ffffffff80014f46>] sync_buffer+0x0/0x3f

 [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78

 [<ffffffff8009d0ca>] wake_bit_function+0x0/0x23

 [<ffffffff885d4f7f>] :gfs2:gfs2_meta_check_ii+0x2c/0x38

 [<ffffffff885c6a06>] :gfs2:gfs2_meta_indirect_buffer+0x104/0x15e

 [<ffffffff885c195a>] :gfs2:gfs2_inode_refresh+0x22/0x2ca

 [<ffffffff885c0d9c>] :gfs2:inode_go_lock+0x29/0x57

 [<ffffffff885bff04>] :gfs2:glock_wait_internal+0x1d4/0x23f

 [<ffffffff885c011d>] :gfs2:gfs2_glock_nq+0x1ae/0x1d4

 [<ffffffff885cc053>] :gfs2:gfs2_lookup+0x58/0xa7

 [<ffffffff885cc04b>] :gfs2:gfs2_lookup+0x50/0xa7

 [<ffffffff800226dd>] d_alloc+0x174/0x1a9

 [<ffffffff8000cbff>] do_lookup+0xe5/0x1e6

 [<ffffffff80009fac>] __link_path_walk+0xa01/0xf42

 [<ffffffff8000e7cd>] link_path_walk+0x5c/0xe5

 [<ffffffff8002cb45>] mntput_no_expire+0x19/0x89

 [<ffffffff800e7430>] sys_getxattr+0x51/0x62

 [<ffffffff8000c99e>] do_path_lookup+0x270/0x2e8

 [<ffffffff80012336>] getname+0x15b/0x1c1

 [<ffffffff80023741>] __user_walk_fd+0x37/0x4c

 [<ffffffff8003ed91>] vfs_lstat_fd+0x18/0x47

 [<ffffffff8002cb45>] mntput_no_expire+0x19/0x89

 [<ffffffff800e7430>] sys_getxattr+0x51/0x62

 [<ffffffff8002a9d3>] sys_newlstat+0x19/0x31

 [<ffffffff8005d229>] tracesys+0x71/0xe0

 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

............

 

Thank you all for your time

 

Theophanis Kontogiannis

 

 

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Theophanis
Kontogiannis
Sent: Monday, March 16, 2009 5:24 PM
To: 'linux clustering'
Subject: [Linux-cluster] 'ls' makes GFS2 to withdraw

 

Hello all,

 

I have Centos 5.2, kernel  2.6.18-92.1.22.el5.centos.plus,
gfs2-utils-0.1.44-1.el5_2.1

 

The cluster is two nodes, using DRBD 8.3.2 as the shared block device, and
CLVM over it, and GFS2 over it.

 

After an ls in a directory within the GFS2 file system I got the following
errors.

 

........

GFS2: fsid=tweety:gfs2-00.0: fatal: invalid metadata block

GFS2: fsid=tweety:gfs2-00.0:   bh = 522538 (magic number)

GFS2: fsid=tweety:gfs2-00.0:   function = gfs2_meta_indirect_buffer, file =
fs/gfs2/meta_io.c, line = 332

GFS2: fsid=tweety:gfs2-00.0: about to withdraw this file system

GFS2: fsid=tweety:gfs2-00.0: telling LM to withdraw

GFS2: fsid=tweety:gfs2-00.0: withdrawn

 

Call Trace:

 [<ffffffff885c2146>] :gfs2:gfs2_lm_withdraw+0xc1/0xd0

 [<ffffffff800639de>] __wait_on_bit+0x60/0x6e

 [<ffffffff80014f46>] sync_buffer+0x0/0x3f

 [<ffffffff80063a58>] out_of_line_wait_on_bit+0x6c/0x78

 [<ffffffff8009d0ca>] wake_bit_function+0x0/0x23

 [<ffffffff885d3f7f>] :gfs2:gfs2_meta_check_ii+0x2c/0x38

 [<ffffffff885c5a06>] :gfs2:gfs2_meta_indirect_buffer+0x104/0x15e

 [<ffffffff885c095a>] :gfs2:gfs2_inode_refresh+0x22/0x2ca

 [<ffffffff8009d0ca>] wake_bit_function+0x0/0x23

 [<ffffffff885bfd9c>] :gfs2:inode_go_lock+0x29/0x57

 [<ffffffff885bef04>] :gfs2:glock_wait_internal+0x1d4/0x23f

 [<ffffffff885bf11d>] :gfs2:gfs2_glock_nq+0x1ae/0x1d4

 [<ffffffff885cb053>] :gfs2:gfs2_lookup+0x58/0xa7

 [<ffffffff885cb04b>] :gfs2:gfs2_lookup+0x50/0xa7

 [<ffffffff800226dd>] d_alloc+0x174/0x1a9

 [<ffffffff8000cbff>] do_lookup+0xe5/0x1e6

 [<ffffffff80009fac>] __link_path_walk+0xa01/0xf42

 [<ffffffff800c4fe7>] zone_statistics+0x3e/0x6d

 [<ffffffff8000e7cd>] link_path_walk+0x5c/0xe5

 [<ffffffff885bdd6f>] :gfs2:gfs2_glock_put+0x26/0x133

 [<ffffffff8000c99e>] do_path_lookup+0x270/0x2e8

 [<ffffffff80012336>] getname+0x15b/0x1c1

 [<ffffffff80023741>] __user_walk_fd+0x37/0x4c

 [<ffffffff8003ed91>] vfs_lstat_fd+0x18/0x47

 [<ffffffff8002a9d3>] sys_newlstat+0x19/0x31

 [<ffffffff8005d229>] tracesys+0x71/0xe0

 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

..........

 

 

Obviously ls was not the cause of the problem but it triggered the events.

 

>From the other node I can have access on the directory that on which the
'ls' triggered the above. The directory is full of files like that:

 

?--------- ? ?     ?          ?            ? sched_reply

 

Almost 50% of the files are in shown like that with ls.

 

The questions are:

 

1.       Is this a (new) GFS2 bug?

2.       Is this a recoverable problem (and how)?

3.       After a  GFS2 file system gets withdrawn, how do we make the node
to use it again, without rebooting?

 

Thank you all for your time.

 

Theophanis Kontogiannis

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

RE: [Linux-cluster] 'ls' makes GFS2 to withdraw

Reply via email to