I am seeing the attached kernel panics on the lustre client on a regular basis whenever robinhood is running on this client. The following are the
lustre and kernel version on the client and the server.
Patchless Client Version
Lustre version: 1.8.5
Kernel: 2.6.18-194.32.1.el5
Lustre server version:
Lustre version: 1.8.5
kernel: 2.6.18-194.17.1.el5_lustre.1.8.5
Could someone please confirm if statahead is still a problem when running Lustre-1.8.5 on the clients and the servers and does statahead have to
disabled on this client.
Thanks
Nirmal
2011-01-31 14:43:33 Lustre: lustre-OST0020-osc-ffff81023afd2000: Connection to
service lustre-OST0020 via nid 192.168.243.233@o2ib was lost; in progre
ss operations using this service will wait for recovery to complete.
2011-02-01 10:46:59 LustreError: 27320:0:(mdc_locks.c:648:mdc_enqueue())
ldlm_cli_enqueue: -4
2011-02-01 10:46:59 list_add corruption. prev->next should be ffff81043ea0a530,
but was ffff8100448c06c0
2011-02-01 10:46:59 ----------- [cut here ] --------- [please bite here ]
---------
2011-02-01 10:46:59 Kernel BUG at lib/list_debug.c:31
2011-02-01 10:46:59 invalid opcode: 0000 [1] SMP
2011-02-01 10:46:59 last sysfs file:
/devices/pci0000:00/0000:00:0a.0/0000:0e:00.0/irq
2011-02-01 10:46:59 CPU 6
2011-02-01 10:46:59 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler
iptable_nat ip_nat iptable_mangle mgc(U) lustre(U) lov(U) mdc(U) lquota(U)
osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) libafs(PU)
autofs4 nfs fscache nfs_acl lockd sunrpc ip_conntrack_netbios_ns xt_sta
te ip_conntrack nfnetlink ipt_REJECT iptable_filter ip_tables ip6t_REJECT
xt_tcpudp ip6table_filter ip6_tables x_tables ib_iser libiscsi2 scsi_transpo
rt_iscsi2 scsi_transport_iscsi ib_srp ib_sdp ib_ipoib ipoib_helper ipv6
xfrm_nalgo crypto_api rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib
_addr ib_sa dm_mirror dm_multipath scsi_dh video backlight sbs power_meter
hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parpo
rt_pc lp parport sr_mod sg joydev ide_cd usb_storage i2c_piix4 amd64_edac_mod
cdrom e1000e i2c_core edac_mc ib_mthca ib_mad ib_core bnx2 serio_raw pcs
pkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache sata_svw
libata shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci
_hcd
2011-02-01 10:46:59 Pid: 3005, comm: ll_sa_423 Tainted: P
2.6.18-194.32.1.el5 #1
2011-02-01 10:46:59 RIP: 0010:[<ffffffff801544ff>] [<ffffffff801544ff>]
__list_add+0x48/0x68
2011-02-01 10:46:59 RSP: 0018:ffff8101c8bf1c70 EFLAGS: 00010286
2011-02-01 10:46:59 RAX: 0000000000000058 RBX: ffff81043ea0a530 RCX:
ffffffff80311da8
2011-02-01 10:46:59 RDX: ffffffff80311da8 RSI: 0000000000000000 RDI:
ffffffff80311da0
2011-02-01 10:46:59 RBP: ffff81028ba51858 R08: ffffffff80311da8 R09:
0000000000000001
2011-02-01 10:46:59 R10: 0000000000000080 R11: 0000000000000080 R12:
ffff8101c8bf1cc0
2011-02-01 10:46:59 R13: ffff81023a5da200 R14: ffff81043ea0a078 R15:
ffff81008b42a780
2011-02-01 10:46:59 FS: 00002b390c6d2960(0000) GS:ffff810107f22e40(0000)
knlGS:00000000f7fd36c0
2011-02-01 10:46:59 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
2011-02-01 10:46:59 CR2: 0000003dfcc9a6e0 CR3: 000000043c239000 CR4:
00000000000006e0
2011-02-01 10:46:59 Process ll_sa_423 (pid: 3005, threadinfo ffff8101c8bf0000,
task ffff81001d3fe7e0)
2011-02-01 10:46:59 Stack: ffff8101c8bf1cd0 ffff81043ea0a4b0 ffff8101c8bf1cc0
ffffffff88a65661
2011-02-01 10:46:59 0000000000000286 ffffffff88931803 ffffffff889ba1a0
ffff8101fd676000
2011-02-01 10:46:59 0000007000000098 000004b0000000d8 0000000000000104
00000001c8bf1cc0
2011-02-01 10:46:59 Call Trace:
2011-02-01 10:46:59 [<ffffffff88a65661>] :mdc:mdc_enter_request+0x61/0x220
2011-02-01 10:46:59 [<ffffffff88931803>]
:ptlrpc:ptlrpc_set_add_new_req+0x93/0xb0
2011-02-01 10:46:59 [<ffffffff88a6802f>]
:mdc:mdc_intent_getattr_async+0x20f/0x450
2011-02-01 10:46:59 [<ffffffff887d3378>] :libcfs:cfs_alloc+0x68/0xc0
2011-02-01 10:46:59 [<ffffffff88b30164>]
:lustre:ll_statahead_thread+0xf44/0x1750
2011-02-01 10:46:59 [<ffffffff8008d0ad>] default_wake_function+0x0/0xe
2011-02-01 10:46:59 [<ffffffff800b7bf7>] audit_syscall_exit+0x336/0x362
2011-02-01 10:46:59 [<ffffffff8005dfb1>] child_rip+0xa/0x11
2011-02-01 10:46:59 [<ffffffff88b2f220>] :lustre:ll_statahead_thread+0x0/0x1750
2011-02-01 10:46:59 [<ffffffff8005dfa7>] child_rip+0x0/0x11
2011-02-01 10:46:59
2011-02-01 10:46:59
2011-02-01 10:46:59 Code: 0f 0b 68 93 bb 2b 80 c2 1f 00 4c 89 63 08 49 89 1c 24
4c 89
2011-02-01 10:46:59 RIP [<ffffffff801544ff>] __list_add+0x48/0x68
2011-02-01 10:46:59 RSP <ffff8101c8bf1c70>
2011-02-01 10:46:59 <0>Kernel panic - not syncing: Fatal exception
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss