Hi,

On Wed, 2013-09-25 at 16:25 +0200, Pavel Herrmann wrote:
> Hi
> 
> I am trying to build a two-node cluster for samba, but I'm having some GFS2
> issues.
> 
> The nodes themselves run as virtual machines in KVM (on different hosts), use
> gentoo kernel 3.10.7 (not sure what exact version of vanilla it is based on),
> and I use the cluster-next stack in somewhat minimal configuration (corosync-2
> with DLM-4, no pacemaker).
> 
> while testing my cluster (using smbtorture), everything works fine, but the
> moment I let users onto it, i get a kernel error that hangs the cluster
> (fencing is set up and working, but doesnt kick in for some reason)
> 
I suspect that this has been fixed, but without knowing exactly what
version of the kernel this is and what patches have been applied to the
kernel, I'm afraid that I'm a bit in the dark. I don't think we've seen
anything like this recently relating to type 5 glocks,

Steve.

> this is what I get in kernel log:
> 
> Sep 25 07:10:12 fs2 kernel: [18024.888481] GFS2: fsid=fs_clust:homes.1: quota 
> exceeded for user 104202
> Sep 25 07:10:18 fs2 kernel: [18030.335727] GFS2: fsid=fs_clust:homes.1: quota 
> exceeded for user 104202
> Sep 25 07:10:23 fs2 kernel: [18035.994476] original: 
> gfs2_inode_lookup+0x128/0x240 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.994482] pid: 25317
> Sep 25 07:10:23 fs2 kernel: [18035.994484] lock type: 5 req lock state : 3
> Sep 25 07:10:23 fs2 kernel: [18035.994491] new: gfs2_inode_lookup+0x128/0x240 
> [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.994493] pid: 25317
> Sep 25 07:10:23 fs2 kernel: [18035.994494] lock type: 5 req lock state : 3
> Sep 25 07:10:23 fs2 kernel: [18035.994498]  G:  s:SH n:5/168b15e f:Iqob t:SH 
> d:EX/0 a:0 v:0 r:4 m:50
> Sep 25 07:10:23 fs2 kernel: [18035.994506]   H: s:SH f:EH e:0 p:25317 [smbd] 
> gfs2_inode_lookup+0x128/0x240 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.994549] general protection fault: 0000 
> [#1] SMP 
> Sep 25 07:10:23 fs2 kernel: [18035.994840] Modules linked in: iptable_filter 
> ip_tables x_tables gfs2 dm_mod dlm sctp libcrc32c ipv6 configfs virtio_net 
> i6300esb
> Sep 25 07:10:23 fs2 kernel: [18035.995617] CPU: 2 PID: 25317 Comm: smbd Not 
> tainted 3.10.7-gentoo #10
> Sep 25 07:10:23 fs2 kernel: [18035.995910] Hardware name: Bochs Bochs, BIOS 
> Bochs 01/01/2011
> Sep 25 07:10:23 fs2 kernel: [18035.996191] task: ffff8800b2aa1b00 ti: 
> ffff8800a4a02000 task.ti: ffff8800a4a02000
> Sep 25 07:10:23 fs2 kernel: [18035.996546] RIP: 0010:[<ffffffff81053bcb>]  
> [<ffffffff81053bcb>] pid_task+0xb/0x40
> Sep 25 07:10:23 fs2 kernel: [18035.996999] RSP: 0018:ffff8800a4a03a10  
> EFLAGS: 00010206
> Sep 25 07:10:23 fs2 kernel: [18035.997253] RAX: 13270cbeaaf4957b RBX: 
> ffff8800988f7710 RCX: 0000000000000006
> Sep 25 07:10:23 fs2 kernel: [18035.997592] RDX: 0000000000000007 RSI: 
> 0000000000000000 RDI: 13270cbeaaf4957b
> Sep 25 07:10:23 fs2 kernel: [18035.997934] RBP: ffff8800a4b43ba0 R08: 
> 000000000000000a R09: 0000000000000000
> Sep 25 07:10:23 fs2 kernel: [18035.998019] R10: 0000000000000191 R11: 
> 0000000000000190 R12: 0000000000000000
> Sep 25 07:10:23 fs2 kernel: [18035.998019] R13: ffff8800a4b43bf0 R14: 
> ffffffffa0133720 R15: ffff8800995bd988
> Sep 25 07:10:23 fs2 kernel: [18035.998019] FS:  00007f1846316740(0000) 
> GS:ffff8800bfb00000(0000) knlGS:0000000000000000
> Sep 25 07:10:23 fs2 kernel: [18035.998019] CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
> Sep 25 07:10:23 fs2 kernel: [18035.998019] CR2: 000000000122aae8 CR3: 
> 000000009880c000 CR4: 00000000000007a0
> Sep 25 07:10:23 fs2 kernel: [18035.998019] DR0: 0000000000000000 DR1: 
> 0000000000000000 DR2: 0000000000000000
> Sep 25 07:10:23 fs2 kernel: [18035.998019] DR3: 0000000000000000 DR6: 
> 00000000ffff0ff0 DR7: 0000000000000400
> Sep 25 07:10:23 fs2 kernel: [18035.998019] Stack:
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  ffffffffa0111f07 ffff8800b2aa1e70 
> ffffffffa011ffd8 0000000000000000
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  0000000000000000 0000000000000000 
> ffff880000000004 0000000000000032
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  ffff8800a4b43ba0 ffff8800a4b43bf0 
> 00000000626f7149 ffff8800995bd988
> Sep 25 07:10:23 fs2 kernel: [18035.998019] Call Trace:
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa0111f07>] ? 
> gfs2_dump_glock+0x1c7/0x360 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa011ffd8>] ? 
> gfs2_inode_lookup+0x128/0x240 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81457b2b>] ? 
> printk+0x4f/0x54
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81132e7d>] ? 
> inode_init_always+0xed/0x1b0
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145f245>] ? 
> _raw_spin_lock+0x5/0x10
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01138bb>] ? 
> gfs2_glock_nq+0x30b/0x3e0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa011ffe0>] ? 
> gfs2_inode_lookup+0x130/0x240 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa0109195>] ? 
> gfs2_dirent_search+0xe5/0x1c0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa010a4aa>] ? 
> gfs2_dir_search+0x4a/0x80 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01202f7>] ? 
> gfs2_lookupi+0xf7/0x1f0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01203b9>] ? 
> gfs2_lookupi+0x1b9/0x1f0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa0121821>] ? 
> gfs2_lookup+0x21/0xa0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145f245>] ? 
> _raw_spin_lock+0x5/0x10
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff811315e6>] ? 
> d_alloc+0x76/0x90
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81124be3>] ? 
> lookup_dcache+0xa3/0xd0
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff811246c4>] ? 
> lookup_real+0x14/0x50
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81124c42>] ? 
> __lookup_hash+0x32/0x50
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81459d64>] ? 
> lookup_slow+0x3c/0xa2
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145f245>] ? 
> _raw_spin_lock+0x5/0x10
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81126edf>] ? 
> path_lookupat+0x23f/0x780
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa011f169>] ? 
> gfs2_getxattr+0x79/0xa0 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8112744f>] ? 
> filename_lookup+0x2f/0xc0
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff81125ccc>] ? 
> getname_flags+0xbc/0x1a0
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8112a32c>] ? 
> user_path_at_empty+0x5c/0xb0
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffffa01122c6>] ? 
> gfs2_holder_uninit+0x16/0x30 [gfs2]
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8111f8fd>] ? 
> cp_new_stat+0x10d/0x120
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8111facf>] ? 
> vfs_fstatat+0x3f/0x90
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8111fc02>] ? 
> SYSC_newstat+0x12/0x30
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8105ed51>] ? 
> lg_local_lock+0x11/0x20
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  [<ffffffff8145ff69>] ? 
> system_call_fastpath+0x16/0x1b
> Sep 25 07:10:23 fs2 kernel: [18035.998019] Code: 31 f6 48 85 c0 74 0c 8b 50 
> 04 48 c1 e2 05 48 8b 74 10 38 e9 28 ff ff ff 0f 1f 84 00 00 00 00 00 48 85 ff 
> 74 23 89 f6 48 8d 04 f7 <48> 8b 40 08 48 85 c0 74 1c 48 8d 14 76 48 8d 14 d5 
> 30 02 00 00 
> Sep 25 07:10:23 fs2 kernel: [18035.998019] RIP  [<ffffffff81053bcb>] 
> pid_task+0xb/0x40
> Sep 25 07:10:23 fs2 kernel: [18035.998019]  RSP <ffff8800a4a03a10>
> Sep 25 07:10:23 fs2 kernel: [18036.033702] ---[ end trace e5751bbc7d3a8d7c 
> ]---
> 
> 
> simple inspecfion of the gfs2 code showed this is caused by attempting a
> recursive lock. two gfs2_inode_lookups are visible in the trace, not sure
> that is strictly relevant though.
> 
> this is followed by (probaby related) trace:
> 
> 
> Sep 25 07:10:24 fs2 kernel: [18036.162513] BUG: unable to handle kernel NULL 
> pointer dereference at 0000000000000070
> Sep 25 07:10:24 fs2 kernel: [18036.164016] IP: [<ffffffffa011f7c6>] 
> gfs2_permission+0x56/0x110 [gfs2]
> Sep 25 07:10:24 fs2 kernel: [18036.164016] PGD 989a3067 PUD 9886a067 PMD 0 
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Oops: 0000 [#2] SMP 
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Modules linked in: iptable_filter 
> ip_tables x_tables gfs2 dm_mod dlm sctp libcrc32c ipv6 configfs virtio_net 
> i6300esb
> Sep 25 07:10:24 fs2 kernel: [18036.164016] CPU: 1 PID: 25453 Comm: smbd 
> Tainted: G      D      3.10.7-gentoo #10
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Hardware name: Bochs Bochs, BIOS 
> Bochs 01/01/2011
> Sep 25 07:10:24 fs2 kernel: [18036.164016] task: ffff8800afca0d80 ti: 
> ffff8800a4a02000 task.ti: ffff8800a4a02000
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RIP: 0010:[<ffffffffa011f7c6>]  
> [<ffffffffa011f7c6>] gfs2_permission+0x56/0x110 [gfs2]
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RSP: 0018:ffff8800a4a03c08  
> EFLAGS: 00010286
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RAX: ffffffff8145f245 RBX: 
> 0000000000000040 RCX: 0000000000000000
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RDX: ffff8800b5668f00 RSI: 
> 0000000000000001 RDI: ffff8800a4b97ddc
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RBP: ffff880099486e60 R08: 
> 0000000000000061 R09: 0000000000000000
> Sep 25 07:10:24 fs2 kernel: [18036.164016] R10: ff48ad3954b34002 R11: 
> d09e94939e979e85 R12: ffff8800a4b97ddc
> Sep 25 07:10:24 fs2 kernel: [18036.164016] R13: 0000000000000001 R14: 
> ffff8800a4b97df8 R15: ffff8800afca0d80
> Sep 25 07:10:24 fs2 kernel: [18036.164016] FS:  00007f1846316740(0000) 
> GS:ffff8800bfa80000(0000) knlGS:0000000000000000
> Sep 25 07:10:24 fs2 kernel: [18036.164016] CS:  0010 DS: 0000 ES: 0000 CR0: 
> 0000000080050033
> Sep 25 07:10:24 fs2 kernel: [18036.164016] CR2: 0000000000000070 CR3: 
> 000000009880c000 CR4: 00000000000007a0
> Sep 25 07:10:24 fs2 kernel: [18036.164016] DR0: 0000000000000000 DR1: 
> 0000000000000000 DR2: 0000000000000000
> Sep 25 07:10:24 fs2 kernel: [18036.164016] DR3: 0000000000000000 DR6: 
> 00000000ffff0ff0 DR7: 0000000000000400
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Stack:
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  ffff8800994e0c00 ffffffff81125a8b 
> ffff8800a4a03c18 ffff8800a4a03c18
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  0000000000000000 ffff8800bbba8d20 
> 0000000800000003 0000000200000000
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  ffffffff8145f245 ffffffff8112ff5e 
> ffff8800a4a03e08 0000000000000007
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Call Trace:
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81125a8b>] ? 
> lookup_fast+0x1ab/0x2f0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8145f245>] ? 
> _raw_spin_lock+0x5/0x10
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112ff5e>] ? 
> dput+0x17e/0x220
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112610a>] ? 
> link_path_walk+0x23a/0x8b0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81126b9c>] ? 
> path_init+0x30c/0x410
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81126cf2>] ? 
> path_lookupat+0x52/0x780
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112744f>] ? 
> filename_lookup+0x2f/0xc0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff81125ccc>] ? 
> getname_flags+0xbc/0x1a0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8112a32c>] ? 
> user_path_at_empty+0x5c/0xb0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8111facf>] ? 
> vfs_fstatat+0x3f/0x90
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8111fc02>] ? 
> SYSC_newstat+0x12/0x30
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8111b420>] ? 
> SyS_read+0x50/0xa0
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  [<ffffffff8145ff69>] ? 
> system_call_fastpath+0x16/0x1b
> Sep 25 07:10:24 fs2 kernel: [18036.164016] Code: c6 50 65 48 8b 04 25 80 b7 
> 00 00 48 8b 90 40 02 00 00 4c 39 f3 75 14 eb 1a 0f 1f 40 00 48 3b 53 18 74 12 
> 48 8b 1b 49 39 de 74 08 <48> 8b 43 30 a8 40 75 ea 31 db 4c 89 e7 e8 e8 78 f0 
> e0 66 90 45 
> Sep 25 07:10:24 fs2 kernel: [18036.164016] RIP  [<ffffffffa011f7c6>] 
> gfs2_permission+0x56/0x110 [gfs2]
> Sep 25 07:10:24 fs2 kernel: [18036.164016]  RSP <ffff8800a4a03c08>
> Sep 25 07:10:24 fs2 kernel: [18036.164016] CR2: 0000000000000070
> Sep 25 07:10:24 fs2 kernel: [18036.218133] ---[ end trace e5751bbc7d3a8d7d 
> ]---
> 
> afterwards the log is filled with "INFO: rcu_sched self-detected stall" and
> NMI-caused backtraces
> 
> Is this a known-and-fixed bug? is there a way to prevent this?
> 
> 
> thanks
> Pavel Herrmann
> 


-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to