Hello,

I have quite a very critical problem.

One of my OSSes hanfs into a kernel panic when trying to mount the OSTs.

After mounting 11 OSTs over 12 total OSTs it goes into kernel panic. Does not matter hte order in which they are mounted.

Any clue on hints ?

I cannot really recover it and I have important data on it.

I already performed an e2fsck. Anyway it did not fix. it has found a few inode count inconsistencies before.

kernel is 2.6.32-431.23.3.el6_lustre.x86_64

Red Hat Enterprise Linux Server release 6.7 (Santiago)

lustre-2.5.3-2.6.32_431.23.3.el6_lustre.x86_64.x86_64


Oct 30 04:58:52 psanaoss231 kernel: INFO: task tgt_recov:4569 blocked for more than 120 seconds.

Oct 30 04:58:52 psanaoss231 kernel:      Not tainted 2.6.32-431.23.3.el6_lustre.x86_64 #1 Oct 30 04:58:52 psanaoss231 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 30 04:58:52 psanaoss231 kernel: tgt_recov     D 0000000000000003     0  4569      2 0x00000080 Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2ae1da0 0000000000000046 0000000000000000 0000000000000003 Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2ae1d30 ffffffff81059096 ffff880bf2ae1d40 ffff880bf2a1d500 Oct 30 04:58:52 psanaoss231 kernel: ffff880bf2b01ab8 ffff880bf2ae1fd8 000000000000fbc8 ffff880bf2b01ab8
Oct 30 04:58:52 psanaoss231 kernel: Call Trace:
Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff81059096>] ? enqueue_task+0x66/0x80 Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07ae560>] ? check_for_clients+0x0/0x70 [ptlrpc] Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07afbcd>] target_recovery_overseer+0x9d/0x230 [ptlrpc] Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07ae250>] ? exp_connect_healthy+0x0/0x20 [ptlrpc] Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40 Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b6490>] ? target_recovery_thread+0x0/0x1920 [ptlrpc] Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b69d0>] target_recovery_thread+0x540/0x1920 [ptlrpc] Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff81061d12>] ? default_wake_function+0x12/0x20 Oct 30 04:58:52 psanaoss231 kernel: [<ffffffffa07b6490>] ? target_recovery_thread+0x0/0x1920 [ptlrpc]
Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0
Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
Oct 30 04:58:52 psanaoss231 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Oct 30 04:59:02 psanaoss231 kernel: Lustre: ana13-OST0004: Recovery over after 3:05, of 147 clients 146 recovered and 1 was evicted. Oct 30 04:59:03 psanaoss231 kernel: Lustre: ana13-OST0004: Client 89ba817f-45c3-5e64-99a8-b472651bbe45 (at 172.21.52.213@o2ib) reconnecting Oct 30 04:59:03 psanaoss231 kernel: Lustre: Skipped 94 previous similar messages Oct 30 04:59:21 psanaoss231 kernel: LustreError: 4569:0:(ost_handler.c:1123:ost_brw_write()) Dropping timed-out write from 12345-172.21.49.129@tcp because locking object 0x0:14198730 took 153 seconds (limit was 30). Oct 30 04:59:21 psanaoss231 kernel: Lustre: ana13-OST0005: Bulk IO write error with 3a71df2f-16e7-d507-2495-ab60364d8e7c (at 172.21.49.129@tcp), client will retry: rc -110
Oct 30 04:59:52 psanaoss231 kernel: ------------[ cut here ]------------
Oct 30 04:59:52 psanaoss231 kernel: kernel BUG at fs/jbd2/transaction.c:1033!
Oct 30 04:59:52 psanaoss231 kernel: invalid opcode: 0000 [#1] SMP
Oct 30 04:59:52 psanaoss231 kernel: last sysfs file: /sys/devices/system/cpu/online
Oct 30 04:59:52 psanaoss231 kernel: CPU 10
Oct 30 04:59:52 psanaoss231 kernel: Modules linked in: osp(U) ofd(U) lfsck(U) ost(U) mgc(U) fsfilt_ldiskfs(U) osd_ldiskfs(U) lquota(U) ldiskfs(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic crc32c_intel libcfs(U) nfs lockd fscache auth_rpcgss nfs_acl mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase autofs4 sunrpc ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 microcode power_meter iTCO_wdt iTCO_vendor_support dcdbas ipmi_devintf sb_edac edac_core lpc_ich mfd_core shpchp igb i2c_algo_bit i2c_core ses enclosure sg ixgbe dca ptp pps_core mdio ext4 jbd2 mbcache raid1 sd_mod crc_t10dif ahci wmi mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Oct 30 04:59:52 psanaoss231 kernel:
Oct 30 04:59:52 psanaoss231 kernel: Pid: 4272, comm: ll_ost01_007 Not tainted 2.6.32-431.23.3.el6_lustre.x86_64 #1 Dell Inc. PowerEdge R620/0PXXHP Oct 30 04:59:52 psanaoss231 kernel: RIP: 0010:[<ffffffffa01198ad>]  [<ffffffffa01198ad>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2] Oct 30 04:59:52 psanaoss231 kernel: RSP: 0018:ffff880c058437d0 EFLAGS: 00010246 Oct 30 04:59:52 psanaoss231 kernel: RAX: ffff880c05573dc0 RBX: ffff880c043b8d08 RCX: ffff88175b0fedc8 Oct 30 04:59:52 psanaoss231 kernel: RDX: 0000000000000000 RSI: ffff88175b0fedc8 RDI: 0000000000000000 Oct 30 04:59:52 psanaoss231 kernel: RBP: ffff880c058437f0 R08: 9010000000000000 R09: e886f5e8fbf37202 Oct 30 04:59:52 psanaoss231 kernel: R10: 0000000000000002 R11: 0000000000000000 R12: ffff880c040c26d8 Oct 30 04:59:52 psanaoss231 kernel: R13: ffff88175b0fedc8 R14: ffff88174728c800 R15: 0000000000000008 Oct 30 04:59:52 psanaoss231 kernel: FS:  0000000000000000(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000 Oct 30 04:59:52 psanaoss231 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b Oct 30 04:59:52 psanaoss231 kernel: CR2: 00000034f304b750 CR3: 0000000001a85000 CR4: 00000000000407e0 Oct 30 04:59:52 psanaoss231 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 30 04:59:52 psanaoss231 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 30 04:59:52 psanaoss231 kernel: Process ll_ost01_007 (pid: 4272, threadinfo ffff880c05842000, task ffff880c0634eaa0)
Oct 30 04:59:52 psanaoss231 kernel: Stack:
Oct 30 04:59:52 psanaoss231 kernel: ffff880c043b8d08 ffffffffa0d136f0 ffff88175b0fedc8 0000000000000000 Oct 30 04:59:52 psanaoss231 kernel: <d> ffff880c05843830 ffffffffa0cd100b ffff880c05843820 ffffffff8109af8f Oct 30 04:59:52 psanaoss231 kernel: <d> ffff88175b105a40 ffff880c043b8d08 0000000000000018 ffff88175b0fedc8
Oct 30 04:59:52 psanaoss231 kernel: Call Trace:
Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0cd100b>] __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109af8f>] ? wake_up_bit+0x2f/0x40 Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d067c5>] ldiskfs_quota_write+0x165/0x210 [ldiskfs] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811eef11>] v2_write_file_info+0xa1/0xe0 Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811eb018>] dquot_acquire+0x138/0x140 Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d05956>] ldiskfs_acquire_dquot+0x66/0xb0 [ldiskfs]
Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811ecf8c>] dqget+0x2ac/0x390
Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff811ed51b>] dquot_initialize+0x7b/0x240 Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8116f553>] ? kmem_cache_alloc_trace+0x1a3/0x1b0 Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0d05bb3>] ldiskfs_dquot_initialize+0x83/0xd0 [ldiskfs] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0dd0baf>] osd_attr_set+0x12f/0x540 [osd_ldiskfs] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ecb969>] dt_attr_set.clone.2+0x29/0xc0 [ofd] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ecf472>] ofd_attr_set+0x522/0x6c0 [ofd] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0ec0e68>] ofd_setattr+0x678/0xc10 [ofd] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07eeeae>] ? lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0e711bb>] ost_setattr+0x30b/0x930 [ost] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa0e741bd>] ost_handle+0x1f8d/0x44d0 [ost] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07f68db>] ? ptlrpc_update_export_timer+0x4b/0x560 [ptlrpc] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07fecf5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa05164ce>] ? cfs_timer_arm+0xe/0x10 [libcfs] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa05273cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07f63d9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90 Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa080005d>] ptlrpc_main+0xaed/0x1740 [ptlrpc] Oct 30 04:59:52 psanaoss231 kernel: [<ffffffffa07ff570>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0
Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
Oct 30 04:59:52 psanaoss231 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 Oct 30 04:59:52 psanaoss231 kernel: Code: c6 9c 03 00 00 4c 89 f7 e8 c1 21 41 e1 48 8b 33 ba 01 00 00 00 4c 89 e7 e8 11 ec ff ff 4c 89 f0 66 ff 00 66 66 90 e9 73 ff ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b 66 0f 1f 84 00 00 00 00 00 eb f5 Oct 30 04:59:52 psanaoss231 kernel: RIP  [<ffffffffa01198ad>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
Oct 30 04:59:52 psanaoss231 kernel: RSP <ffff880c058437d0>
Oct 30 04:59:52 psanaoss231 kernel: ---[ end trace 5ceb40448d3277c6 ]---
Oct 30 04:59:52 psanaoss231 kernel: Kernel panic - not syncing: Fatal exception Oct 30 04:59:52 psanaoss231 kernel: Pid: 4272, comm: ll_ost01_007 Tainted: G      D    --------------- 2.6.32-431.23.3.el6_lustre.x86_64 #1

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to