I just had two nodes hang with the following soft lockup messages.
I am running Centos 5.2 (2.6.18-93.1.13.el5) with the patchless client
(1.6.5.1).  My nodes do not have swap configured on them (no local
disks).   We do have a tool that looks for out of memory condition
and neither of the nodes in question reported a problem (not that it
is perfect).

Does the problem look like an issue with Lustre?

Thanks,
Craig


Oct 22 08:06:45 h53 kernel: BUG: soft lockup - CPU#2 stuck for 10s! 
[kswapd0:418]
Oct 22 08:06:45 h53 kernel: CPU 2:
Oct 22 08:06:45 h53 kernel: Modules linked in: zxfs(U) rs_iscsi(PFU) dm_mod(U) 
sd_mod(U) rs_cfg(PU) autofs4(U) mgc(U) lustre(U) 
lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) 
lvfs(U) libcfs(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) 
ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U) xfrm_nalgo(U) 
crypto_api(U) ib_uverbs(U) ib_umad(U) mlx4_ib(U) 
mlx4_core(U) nfsd(U) exportfs(U) auth_rpcgss(U) video(U) sbs(U) backlight(U) 
i2c_ec(U) button(U) battery(U) asus_acpi(U) 
acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) sr_mod(U) 
cdrom(U) sg(U) ib_mthca(U) pcspkr(U) usb_storage(U) 
ata_piix(U) ib_mad(U) libata(U) ib_core(U) i2c_i801(U) i2c_core(U) shpchp(U) 
scsi_mod(U) serio_raw(U) ehci_hcd(U) ohci_hcd(U) 
uhci_hcd(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) sunrpc(U) e1000(U) e1000e(U)
Oct 22 08:06:45 h53 kernel: Pid: 418, comm: kswapd0 Tainted: PF     
2.6.18-92.1.13.el5 #1
Oct 22 08:06:45 h53 kernel: RIP: 0010:[<ffffffff80064be4>]  
[<ffffffff80064be4>] .text.lock.spinlock+0x2/0x30
Oct 22 08:06:45 h53 kernel: RSP: 0000:ffff81042ea099b8  EFLAGS: 00000286
Oct 22 08:06:45 h53 kernel: RAX: ffff810423adeee8 RBX: ffff81035d272200 RCX: 
ffff8101c866eb28
Oct 22 08:06:45 h53 kernel: RDX: 0000000000000000 RSI: ffff810282e8eb10 RDI: 
ffff81035d272388
Oct 22 08:06:45 h53 kernel: RBP: 0000000000000086 R08: ffff81000001b600 R09: 
ffff81010af7be18
Oct 22 08:06:45 h53 kernel: R10: ffff810107b41a60 R11: ffffffff888117d0 R12: 
0000000000000086
Oct 22 08:06:45 h53 kernel: R13: ffff81010d4dce88 R14: ffff81010eb8f0c0 R15: 
0000000000000086
Oct 22 08:06:45 h53 kernel: FS:  0000000000000000(0000) 
GS:ffff81010eba1ec0(0000) knlGS:0000000000000000
Oct 22 08:06:45 h53 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Oct 22 08:06:45 h53 kernel: CR2: 00000000371b1000 CR3: 000000014772f000 CR4: 
00000000000006e0
Oct 22 08:06:45 h53 kernel:
Oct 22 08:06:45 h53 kernel: Call Trace:
Oct 22 08:06:45 h53 kernel:  [<ffffffff8871125a>] 
:osc:cache_remove_extent+0x4a/0x90
Oct 22 08:06:45 h53 kernel:  [<ffffffff88707c5a>] 
:osc:osc_teardown_async_page+0x25a/0x3c0
Oct 22 08:06:45 h53 kernel:  [<ffffffff800d4589>] slab_destroy+0x80/0xa8
Oct 22 08:06:45 h53 kernel:  [<ffffffff800d46cd>] free_block+0x11c/0x13b
Oct 22 08:06:45 h53 kernel:  [<ffffffff8877c8e0>] 
:lov:lov_teardown_async_page+0x320/0x4e0
Oct 22 08:06:45 h53 kernel:  [<ffffffff801453a1>] radix_tree_delete+0x150/0x187
Oct 22 08:06:45 h53 kernel:  [<ffffffff887f7ba1>] 
:lustre:__ll_put_llap+0x3e1/0x610
Oct 22 08:06:45 h53 kernel:  [<ffffffff80023601>] __pagevec_free+0x21/0x2e
Oct 22 08:06:45 h53 kernel:  [<ffffffff887f830d>] 
:lustre:ll_removepage+0x1dd/0x240
Oct 22 08:06:45 h53 kernel:  [<ffffffff800c087c>] 
__remove_from_page_cache+0x1b/0x3a
Oct 22 08:06:45 h53 kernel:  [<ffffffff888117e0>] 
:lustre:ll_releasepage+0x10/0x20
Oct 22 08:06:45 h53 kernel:  [<ffffffff800c4fb1>] 
shrink_inactive_list+0x4e1/0x7f9
Oct 22 08:06:45 h53 kernel:  [<ffffffff800479c3>] __pagevec_release+0x19/0x22
Oct 22 08:06:45 h53 kernel:  [<ffffffff800c49af>] shrink_active_list+0x416/0x426
Oct 22 08:06:45 h53 kernel:  [<ffffffff80012c22>] shrink_zone+0xf6/0x11c
Oct 22 08:06:45 h53 kernel:  [<ffffffff8005792d>] kswapd+0x337/0x45e
Oct 22 08:06:45 h53 kernel:  [<ffffffff8009e30e>] 
autoremove_wake_function+0x0/0x2e
Oct 22 08:06:45 h53 kernel:  [<ffffffff8009e0e8>] 
keventd_create_kthread+0x0/0xc4
Oct 22 08:06:45 h53 kernel:  [<ffffffff800575f6>] kswapd+0x0/0x45e
Oct 22 08:06:45 h53 kernel:  [<ffffffff8009e0e8>] 
keventd_create_kthread+0x0/0xc4
Oct 22 08:06:45 h53 kernel:  [<ffffffff80032625>] kthread+0xfe/0x132
Oct 22 08:06:45 h53 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
Oct 22 08:06:45 h53 kernel:  [<ffffffff8009e0e8>] 
keventd_create_kthread+0x0/0xc4
Oct 22 08:06:45 h53 kernel:  [<ffffffff80032527>] kthread+0x0/0x132
Oct 22 08:06:45 h53 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
Oct 22 08:06:45 h53 kernel:
Oct 22 08:06:46 h53 kernel: BUG: soft lockup - CPU#4 stuck for 10s! 
[convert_emiss.e:27258]
Oct 22 08:06:46 h53 kernel: CPU 4:
Oct 22 08:06:46 h53 kernel: Modules linked in: zxfs(U) rs_iscsi(PFU) dm_mod(U) 
sd_mod(U) rs_cfg(PU) autofs4(U) mgc(U) lustre(U) 
lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) 
lvfs(U) libcfs(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) 
ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U) xfrm_nalgo(U) 
crypto_api(U) ib_uverbs(U) ib_umad(U) mlx4_ib(U) 
mlx4_core(U) nfsd(U) exportfs(U) auth_rpcgss(U) video(U) sbs(U) backlight(U) 
i2c_ec(U) button(U) battery(U) asus_acpi(U) 
acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) sr_mod(U) 
cdrom(U) sg(U) ib_mthca(U) pcspkr(U) usb_storage(U) 
ata_piix(U) ib_mad(U) libata(U) ib_core(U) i2c_i801(U) i2c_core(U) shpchp(U) 
scsi_mod(U) serio_raw(U) ehci_hcd(U) ohci_hcd(U) 
uhci_hcd(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) sunrpc(U) e1000(U) e1000e(U)
Oct 22 08:06:46 h53 kernel: Pid: 27258, comm: convert_emiss.e Tainted: PF     
2.6.18-92.1.13.el5 #1
Oct 22 08:06:46 h53 kernel: RIP: 0010:[<ffffffff80064be7>]  
[<ffffffff80064be7>] .text.lock.spinlock+0x5/0x30
Oct 22 08:06:46 h53 kernel: RSP: 0018:ffff8102ba4c3770  EFLAGS: 00000286
Oct 22 08:06:46 h53 kernel: RAX: ffff810423adeec0 RBX: ffff8102e617aad0 RCX: 
0000000000000000
Oct 22 08:06:46 h53 kernel: RDX: 0000000000000000 RSI: 0000000004df5578 RDI: 
ffff810423adeec0
Oct 22 08:06:46 h53 kernel: RBP: 0000000000000000 R08: 0000000000000002 R09: 
0000000000000000
Oct 22 08:06:46 h53 kernel: R10: ffff810164619cb0 R11: 00000000000000d0 R12: 
0000000000000000
Oct 22 08:06:46 h53 kernel: R13: 0000000000000000 R14: ffff810423c36468 R15: 
ffff81042ebc4000
Oct 22 08:06:46 h53 kernel: FS:  00002b8ce6c66630(0000) 
GS:ffff81042fc23d40(0000) knlGS:0000000000000000
Oct 22 08:06:46 h53 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 22 08:06:46 h53 kernel: CR2: 0000000001e46604 CR3: 0000000307228000 CR4: 
00000000000006e0
Oct 22 08:06:46 h53 kernel:
Oct 22 08:06:46 h53 kernel: Call Trace:
Oct 22 08:06:46 h53 kernel:  [<ffffffff887066d5>] 
:osc:osc_queue_async_io+0x7c5/0x1020
Oct 22 08:06:46 h53 kernel:  [<ffffffff8024394c>] 
udp_push_pending_frames+0x236/0x25b
Oct 22 08:06:46 h53 kernel:  [<ffffffff80064b11>] _spin_lock_bh+0x9/0x14
Oct 22 08:06:46 h53 kernel:  [<ffffffff80030b4b>] release_sock+0x13/0xaa
Oct 22 08:06:46 h53 kernel:  [<ffffffff8005215c>] udp_sendmsg+0x47f/0x566
Oct 22 08:06:46 h53 kernel:  [<ffffffff887fcb44>] 
:lustre:ll_prepare_write+0x484/0x1360
Oct 22 08:06:46 h53 kernel:  [<ffffffff888117d0>] 
:lustre:ll_releasepage+0x0/0x20
Oct 22 08:06:46 h53 kernel:  [<ffffffff88628574>] 
:ptlrpc:ldlm_resource_get+0x1b4/0xa70
Oct 22 08:06:46 h53 kernel:  [<ffffffff8877daf3>] 
:lov:lov_queue_async_io+0x353/0x450
Oct 22 08:06:46 h53 kernel:  [<ffffffff886263c6>] 
:ptlrpc:ldlm_resource_putref+0x1b6/0x3a0
Oct 22 08:06:46 h53 kernel:  [<ffffffff887f3eb0>] 
:lustre:queue_or_sync_write+0x310/0x1270
Oct 22 08:06:46 h53 kernel:  [<ffffffff8002e341>] __wake_up+0x38/0x4f
Oct 22 08:06:46 h53 kernel:  [<ffffffff887faf3c>] 
:lustre:ll_commit_write+0x2dc/0x6e0
Oct 22 08:06:46 h53 kernel:  [<ffffffff88797d0f>] 
:lov:lov_finish_set+0x67f/0x6e0
Oct 22 08:06:46 h53 kernel:  [<ffffffff8879b455>] :lov:lov_set_add_req+0x15/0x20
Oct 22 08:06:46 h53 kernel:  [<ffffffff8000fe3c>] 
generic_file_buffered_write+0x4dc/0x6d3
Oct 22 08:06:46 h53 kernel:  [<ffffffff887e4fb9>] 
:lustre:ll_inode_size_unlock+0x89/0xe0
Oct 22 08:06:46 h53 kernel:  [<ffffffff8000df96>] current_fs_time+0x3b/0x40
Oct 22 08:06:46 h53 kernel:  [<ffffffff88798533>] 
:lov:lov_fini_cancel_set+0x1e3/0x260
Oct 22 08:06:46 h53 kernel:  [<ffffffff80016088>] 
__generic_file_aio_write_nolock+0x36c/0x3b8
Oct 22 08:06:46 h53 kernel:  [<ffffffff8863cf60>] 
:ptlrpc:ldlm_completion_ast+0x0/0x6a0
Oct 22 08:06:46 h53 kernel:  [<ffffffff887d9ce0>] 
:lustre:ll_glimpse_callback+0x0/0x440
Oct 22 08:06:46 h53 kernel:  [<ffffffff800c0727>] 
__generic_file_write_nolock+0x8f/0xa8
Oct 22 08:06:46 h53 kernel:  [<ffffffff887cf509>] 
:lustre:ll_extent_unlock+0x419/0x520
Oct 22 08:06:46 h53 kernel:  [<ffffffff8009e30e>] 
autoremove_wake_function+0x0/0x2e
Oct 22 08:06:46 h53 kernel:  [<ffffffff887d054a>] 
:lustre:ll_file_get_tree_lock_iov+0x44a/0x5e0
Oct 22 08:06:46 h53 kernel:  [<ffffffff8011fb2f>] avc_has_perm+0x43/0x55
Oct 22 08:06:46 h53 kernel:  [<ffffffff80063ae4>] mutex_lock+0xd/0x1d
Oct 22 08:06:46 h53 kernel:  [<ffffffff800c0788>] generic_file_writev+0x48/0xa2
Oct 22 08:06:46 h53 kernel:  [<ffffffff887d129f>] 
:lustre:ll_file_writev+0xbbf/0xe20
Oct 22 08:06:46 h53 kernel:  [<ffffffff887d151a>] 
:lustre:ll_file_write+0x1a/0x20
Oct 22 08:06:46 h53 kernel:  [<ffffffff8001649a>] vfs_write+0xce/0x174
Oct 22 08:06:46 h53 kernel:  [<ffffffff80016d67>] sys_write+0x45/0x6e
Oct 22 08:06:46 h53 kernel:  [<ffffffff8005d116>] system_call+0x7e/0x83
Oct 22 08:06:46 h53 kernel:



-- 
Craig Tierney ([EMAIL PROTECTED])
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to