Hi, On 10/26/2016 09:59 AM, Joseph Qi wrote: > I don't think so. Commit 2070ad1aebff has been merged to 4.8-rc1, but > Gerhard uses 4.7.6. > From the call trace, it seems because of dentry lock issue. I am not > sure if there are any changes on this. > I suggest use stable-4.8.4 and try the same case. Oh yes, this bug is on the call chain of rel-walk of path-name lookup.
BTW, i used the wrong git command: $git describe 2070ad1aebff v4.7-10770-g2070ad1 which should be: git describe --contains 2070ad1aebff v4.8-rc1~52^2~109 Eric > > Thanks, > Joseph > > On 2016/10/26 9:24, Eric Ren wrote: >> Hi Joseph, >> >> Is the following patch for this issue? >> ``` >> commit 3bb8b653c86f6b1d2cc05aa1744fed4b18f99485 >> Author: Joseph Qi <joseph...@huawei.com> >> Date: Mon Sep 19 14:44:33 2016 -0700 >> >> ocfs2: fix double unlock in case retry after free truncate log >> >> If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to >> free truncate log and then retry. Since ocfs2_try_to_free_truncate_log >> will lock/unlock global bitmap inode, we have to unlock it before >> calling this function. But when retry reserve and it fails with no >> /* reserve -> deserve, i think */ >> global bitmap inode lock taken, it will unlock again in error handling >> branch and BUG. >> >> This issue also exists if no need retry and then ocfs2_inode_lock fails. >> So fix it. >> >> Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in >> truncate log") >> Link: http://lkml.kernel.org/r/57d91939.6030...@huawei.com >> Signed-off-by: Joseph Qi <joseph...@huawei.com> >> Signed-off-by: Jiufei Xue <xuejiu...@huawei.com> >> Cc: Mark Fasheh <mfas...@suse.de> >> Cc: Joel Becker <jl...@evilplan.org> >> Cc: Junxiao Bi <junxiao...@oracle.com> >> Signed-off-by: Andrew Morton <a...@linux-foundation.org> >> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> >> ``` >> >> If so, Gerhard, try to backport this fix. >> >> Eric >> >> On 10/26/2016 05:29 AM, Gerhard Mack wrote: >>> Hello, >>> >>> I had a server reboot on me and I'm at a loss as to what caused this >>> crash. Please keep in mind this server is mission critical and my >>> options for testing are rather limited. >>> >>> Anyone have any ideas? >>> Gerhard >>> >>> >>> Oct 25 15:38:38 172.28.23.18 kernel: [ 180.900950] o2net: Connected to >>> node monmailcl01 (num 1) at 10.45.0.11:7777 >>> Oct 25 15:38:39 172.28.23.18 kernel: [ 181.455469] o2dlm: Node 1 joins >>> domain 85372A5B9E7C4C2C95F1E9922D5A83AF ( 1 2 ) 2 nodes >>> Oct 25 15:38:40 172.28.23.18 kernel: [ 182.972901] o2dlm: Node 1 joins >>> domain 490180441A5248339D36ECD96514427C ( 1 2 ) 2 nodes >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.410379] ------------[ cut >>> here ]------------ >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.410452] kernel BUG at >>> fs/ocfs2/dlmglue.c:780! >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.410515] invalid opcode: 0000 >>> [#1] SMP >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.410576] Modules linked in: >>> xt_multiport iptable_filter ocfs2 quota_tree xt_tcpudp iptable_mangle >>> xt_mark >>> ip_tables x_tables ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm >>> ocfs2_nodemanager ocfs2_stackglue ib_iser rdma_cm iw_cm ib_cm ib_core >>> configfs iscsi_tcp >>> libiscsi_tcp libiscsi scsi_transport_iscsi bonding ext4 crc16 jbd2 >>> mbcache coretemp kvm_intel kvm snd_pcm irqbypass snd_timer snd soundcore >>> pcspkr >>> iTCO_wdt iTCO_vendor_support dcdbas evdev shpchp serio_raw i2c_i801 >>> i2c_core acpi_cpufreq lpc_ich mfd_core tpm_tis tpm i5100_edac button >>> edac_core >>> processor loop autofs4 xfs crc32c_generic libcrc32c raid1 md_mod sg >>> sd_mod hid_generic usbhid hid ahci libahci libata e1000e scsi_mod >>> uhci_hcd ehci_pci >>> ehci_hcd usbcore ptp psmouse pps_core usb_common r8169 mii >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] CPU: 3 PID: 3563 >>> Comm: imap Not tainted 4.7.6 #8 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] Hardware name: >>> Dell CS24-SC /CS24-SC , BIOS S45_3A20 >>> 01/21/2009 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] task: >>> ffff8800bb35cd00 ti: ffff8800bb2d8000 task.ti: ffff8800bb2d8000 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] RIP: >>> 0010:[<ffffffffa0535365>] [<ffffffffa0535365>] >>> __ocfs2_cluster_unlock.isra.34+0x4a/0x92 [ocfs2] >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] RSP: >>> 0018:ffff8800bb2dbbe0 EFLAGS: 00010046 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] RAX: >>> 0000000000000246 RBX: ffff8800bbbd7a18 RCX: 000000000005a25c >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] RDX: >>> 0000000000000000 RSI: ffff8800bbbd7a18 RDI: ffff8800bbbd7a84 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] RBP: >>> ffff8800bbbd7a84 R08: ffff8800bb2d8000 R09: 0000000000000001 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] R10: >>> ffff8800bb2dbbd8 R11: 000000000000000b R12: ffff88041782b000 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] R13: >>> 0000000000000246 R14: 0000000000000003 R15: 0000000000000003 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] FS: >>> 00007fe9a96c2700(0000) GS:ffff88043fcc0000(0000) knlGS:0000000000000000 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] CS: 0010 DS: 0000 >>> ES: 0000 CR0: 0000000080050033 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] CR2: >>> 000056169b47e000 CR3: 00000000bb112000 CR4: 00000000000406e0 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] Stack: >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] ffff88042d757c00 >>> 0000000000000000 ffff88042a0e1b40 ffff8800ba8194d8 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] 0000000000000000 >>> ffffffffa0528ce0 ffff88042a0e1b78 ffff8800ba8194d8 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] 0000000000000000 >>> ffff88042a0e1b40 ffff88042a0e1b40 ffff8800ba8194d8 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] Call Trace: >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] [<ffffffffa0528ce0>] >>> ? ocfs2_dentry_attach_lock+0x2c2/0x3f2 [ocfs2] >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] [<ffffffffa0548a8d>] >>> ? ocfs2_lookup+0x17c/0x268 [ocfs2] >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] [<ffffffff81140925>] >>> ? lookup_slow+0xcf/0x104 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] [<ffffffff811422fa>] >>> ? walk_component+0x69/0x12b >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] [<ffffffff81142890>] >>> ? path_lookupat+0x7d/0xfe >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] [<ffffffff81143f8c>] >>> ? filename_lookup+0x78/0xf5 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] [<ffffffff8112a9f9>] >>> ? kmem_cache_alloc+0x99/0x124 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] [<ffffffff8113c544>] >>> ? vfs_fstatat+0x46/0x83 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] [<ffffffff8113c544>] >>> ? vfs_fstatat+0x46/0x83 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] [<ffffffff8113c5ca>] >>> ? SYSC_newstat+0x10/0x27 >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] [<ffffffff813f831b>] >>> ? entry_SYSCALL_64_fastpath+0x13/0x8f >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] Code: db 75 02 0f 0b >>> 41 83 fe 03 49 89 c5 74 16 41 83 fe 05 >>> 75 20 8b 53 5c 85 d2 75 02 0f 0b ff ca 89 53 5c eb 12 8b 53 58 85 d2 75 >>> 02 <0f> 0b ff ca 89 53 58 eb 02 0f 0b f6 43 30 04 74 24 8a 43 62 3c >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] RIP >>> [<ffffffffa0535365>] __ocfs2_cluster_unlock.isra.34+0x4a/0x92 [ocfs2] >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] RSP <ffff8800bb2dbbe0> >>> Oct 25 15:40:04 172.28.23.18 kernel: [ 266.414339] ---[ end trace >>> 4eaf20faca7a8f81 ]--- >>> >>> >>> The server hard rebooted after this.. >>> >> >> . >> > > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users