hi ,guys
I have a two-nodes GFS2 cluster based on logic volume created by drbd block
device /dev/drbd0. The two nodes' mount points of GFS2 filesystem are exported
by samba share. Then there are two clients mounting and copying data into them
respectively. Hours later, one client(assume just call it clientA) has finished
all tasks, while the other client(assume just call it clientB) is still copying
with very slow write speed(2-3MB/s, in normal case 40-100MB/s).
Then I doubt that the there is something wrong with gfs2 filesystem on the
corresponding server node that clientB mount to, and I try to write some data
into it by
excute commad as follows:
[root@dcs-229 ~]# dd if=/dev/zero of=./data2 bs=128k count=1000
1000+0 records in
1000+0 records out
131072000 bytes (131 MB) copied, 183.152 s, 716 kB/s
It shows the write speed is too slow, almostly hangs up. I redo it once again,
it hangs up. Then, I terminate it with 『Ctr + c』, and kernel reports error
messages as
follows:
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: fatal: invalid
metadata block
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: bh = 25 (magic
number)
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: function =
gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Trying to
acquire journal lock...
Nov 12 11:50:11 dcs-229 kernel: Pid: 12044, comm: glock_workqueue Not tainted
2.6.32-358.el6.x86_64 #1
Nov 12 11:50:11 dcs-229 kernel: Call Trace:
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044be22>] ?
gfs2_lm_withdraw+0x102/0x130 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096cc0>] ?
wake_bit_function+0x0/0x50
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa044bf75>] ?
gfs2_meta_check_ii+0x45/0x50 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04367d9>] ?
gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8105e203>] ?
perf_event_task_sched_out+0x33/0x80
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0431505>] ?
gfs2_inode_refresh+0x25/0x2c0 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa0430b48>] ? inode_go_lock+0x88/0xf0
[gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f25b>] ? do_promote+0x1bb/0x330
[gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa042f548>] ? finish_xmote+0x178/0x410
[gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04303e3>] ?
glock_work_func+0x133/0x1d0 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffffa04302b0>] ?
glock_work_func+0x0/0x1d0 [gfs2]
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090ac0>] ? worker_thread+0x170/0x2a0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096c80>] ?
autoremove_wake_function+0x0/0x40
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
Nov 12 11:50:11 dcs-229 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Nov 12 11:50:11 dcs-229 kernel: GFS2: fsid=MyCluster:gfs.1: jid=0: Failed
And the other node also reports error messages:
Nov 12 11:48:50 dcs-226 kernel: Pid: 13784, comm: glock_workqueue Not tainted
2.6.32-358.el6.x86_64 #1
Nov 12 11:48:50 dcs-226 kernel: Call Trace:
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478e22>] ?
gfs2_lm_withdraw+0x102/0x130 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffff81096cc0>] ?
wake_bit_function+0x0/0x50
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa0478f75>] ?
gfs2_meta_check_ii+0x45/0x50 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa04637d9>] ?
gfs2_meta_indirect_buffer+0xf9/0x100 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffff8105e203>] ?
perf_event_task_sched_out+0x33/0x80
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045e505>] ?
gfs2_inode_refresh+0x25/0x2c0 [gfs2]
Nov 12 11:48:50 dcs-226 kernel: [<ffffffffa045db48>] ? inode_go_lock+0x88/0xf0
[gfs2]
Nov 12 11:48:50 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: fatal: invalid
metadata block
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: bh = 66213 (magic
number)
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: function =
gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 393
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: about to withdraw
this file system
Nov 12 11:48:51 dcs-226 kernel: GFS2: fsid=MyCluster:gfs.0: telling LM to
unmount
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c25b>] ? do_promote+0x1bb/0x330
[gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045c548>] ? finish_xmote+0x178/0x410
[gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d3e3>] ?
glock_work_func+0x133/0x1d0 [gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffffa045d2b0>] ?
glock_work_func+0x0/0x1d0 [gfs2]
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090ac0>] ? worker_thread+0x170/0x2a0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096c80>] ?
autoremove_wake_function+0x0/0x40
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096916>] ? kthread+0x96/0xa0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff81096880>] ? kthread+0x0/0xa0
Nov 12 11:48:51 dcs-226 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
After this, mount points has crashed. what should i do? Anyone could help me?
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster