what about dmesg output? it's unlikely lustre debug can help here as the problem seem to be very internal to ldiskfs (mballoc piece of it)
thanks, Alex Aaron Knister wrote: > I bumped up debugging, and here's (below) the last bit of debugging > info from lustre that I have on the oss before it went belly up. My > system is totally inoperable. Does anybody have any ideas? > > 00010000:00001000:4:1212157576.884909:0:6378:0:(ldlm_resource.c:865:ldlm_resource_add_lock()) > > About to add this lock: > 00010000:00001000:4:1212157576.884910:0:6378:0:(ldlm_lock.c:1718:ldlm_lock_dump()) > > -- Lock dump: ffff81036e61f1c0/0x1fb135e1e3fd2cc6 (rc: 3) (pos: 0) (pid: > 6378) > 00010000:00001000:4:1212157576.884913:0:6378:0:(ldlm_lock.c:1726:ldlm_lock_dump()) > > Node: local > 00010000:00001000:4:1212157576.884914:0:6378:0:(ldlm_lock.c:1735:ldlm_lock_dump()) > > Resource: ffff8103849211c0 (3129942/0) > 00010000:00001000:4:1212157576.884915:0:6378:0:(ldlm_lock.c:1740:ldlm_lock_dump()) > > Req mode: PW, grant mode: PW, rc: 3, read: 0, write: 1 flags: 0x80004000 > 00010000:00001000:4:1212157576.884917:0:6378:0:(ldlm_lock.c:1746:ldlm_lock_dump()) > > Extent: 0 -> 18446744073709551615 (req 0-18446744073709551615) > 00010000:00000040:4:1212157576.884920:0:6378:0:(ldlm_lock.c:615:ldlm_lock_decref_internal()) > > forcing cancel of local lock > 00010000:00000010:4:1212157576.884922:0:6378:0:(ldlm_lockd.c:1357:ldlm_bl_to_thread()) > > kmalloced 'blwi': 120 at ffff81040e90a340 (tot 49135175) > 00002000:00000040:4:1212157576.884925:0:6378:0:(lustre_fsfilt.h:194:fsfilt_start_log()) > > started handle ffff8103766dfc78 (0000000000000000) > 00002000:00000040:4:1212157576.884930:0:6378:0:(lustre_fsfilt.h:270:fsfilt_commit()) > > committing handle ffff8103766dfc78 > 00002000:00000040:4:1212157576.884931:0:6378:0:(lustre_fsfilt.h:194:fsfilt_start_log()) > > started handle ffff8103766dfc78 (0000000000000000) > 00000020:00000040:4:1212157576.884957:0:5557:0:(lustre_handles.c:121:class_handle_unhash_nolock()) > > removing object ffff81036e61f1c0 with handle 0x1fb135e1e3fd2cc6 from hash > 00000100:00000010:4:1212157576.884960:0:5557:0:(client.c:394:ptlrpc_prep_set()) > > kmalloced 'set': 104 at ffff8104012d38c0 (tot 49135279) > 00000100:00000010:4:1212157576.884962:0:5557:0:(client.c:457:ptlrpc_set_destroy()) > > kfreed 'set': 104 at ffff8104012d38c0 (tot 49135175). > 00010000:00000040:4:1212157576.884964:0:5557:0:(ldlm_resource.c:818:ldlm_resource_putref()) > > putref res: ffff8103849211c0 count: 0 > 00010000:00000010:4:1212157576.884969:0:5557:0:(ldlm_resource.c:828:ldlm_resource_putref()) > > kfreed 'res->lr_lvb_data': 40 at ffff810379ded880 (tot 49135135). > 00010000:00000010:4:1212157576.885000:0:5557:0:(ldlm_resource.c:829:ldlm_resource_putref()) > > slab-freed 'res': 224 at ffff8103849211c0 (tot 49135135). > 00010000:00000010:4:1212157576.885002:0:5557:0:(ldlm_lockd.c:1657:ldlm_bl_thread_main()) > > kfreed 'blwi': 120 at ffff81040e90a340 (tot 49134791). > 00002000:00000040:4:1212157576.885623:0:6378:0:(lustre_fsfilt.h:270:fsfilt_commit()) > > committing handle ffff8103766dfc78 > 00002000:00000002:4:1212157576.885625:0:6378:0:(filter.c:148:f_dput()) > putting 3129942: ffff8103599cea98, count = 0 > 00002000:00080000:4:1212157576.885627:0:6378:0:(filter.c:2689:filter_destroy_precreated()) > > crew4-OST0001: after destroy: set last_objids[0] = 3129941 > 00002000:00000002:4:1212157576.885630:0:6378:0:(filter.c:607:filter_update_last_objid()) > > crew4-OST0001: server last_objid for group 0: 3129941 > 00002000:00000010:4:1212157576.912615:0:6485:0:(fsfilt-ldiskfs.c:747:fsfilt_ldiskfs_cb_func()) > > slab-freed 'fcb': 56 at ffff810371404920 (tot 49134335). > 00010000:00000040:4:1212157576.912669:0:6378:0:(ldlm_lib.c:1556:target_committed_to_req()) > > last_committed 17896268, xid 3841 > 00000100:00000040:4:1212157576.912674:0:6378:0:(connection.c:191:ptlrpc_connection_addref()) > > connection=ffff8103fbe9e2c0 refcount 10 to 172.18.0.10 > <http://172.18.0.10>@o2ib > 00000100:00000040:4:1212157576.912678:0:6378:0:(niobuf.c:46:ptl_send_buf()) > conn=ffff8103fbe9e2c0 id [EMAIL PROTECTED] > 00000400:00000010:4:1212157576.912680:0:6378:0:(lib-lnet.h:247:lnet_md_alloc()) > > kmalloced 'md': 136 at ffff81040cb6cb80 (tot 9568949). > 00000400:00000010:4:1212157576.912683:0:6378:0:(lib-lnet.h:295:lnet_msg_alloc()) > > kmalloced 'msg': 336 at ffff8104285e1e00 (tot 9569285). > 00000100:00000040:4:1212157576.912693:0:6378:0:(connection.c:150:ptlrpc_put_connection()) > > connection=ffff8103fbe9e2c0 refcount 9 to 172.18.0.10 > <http://172.18.0.10>@o2ib > 00000100:00000040:4:1212157576.912695:0:6378:0:(service.c:648:ptlrpc_server_handle_request()) > > RPC PUTting export ffff8103848e9000 : new rpc_count 0 > 00000100:00000040:4:1212157576.912697:0:6378:0:(service.c:648:ptlrpc_server_handle_request()) > > PUTting export ffff8103848e9000 : new refcount 4 > 00000100:00000040:4:1212157576.912699:0:6378:0:(service.c:652:ptlrpc_server_handle_request()) > > PUTting export ffff8103848e9000 : new refcount 3 > 00000400:00000010:4:1212157576.912741:0:5351:0:(lib-lnet.h:269:lnet_md_free()) > > kfreed 'md': 136 at ffff81040cb6cb80 (tot 9569149). > 00000400:00000010:4:1212157576.912744:0:5351:0:(lib-lnet.h:312:lnet_msg_free()) > > kfreed 'msg': 336 at ffff8104285e1e00 (tot 9568813). > > > On Wed, May 28, 2008 at 8:03 PM, Aaron Knister <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > Thank you very much for looking into this. I've attached my dmesg to > the bug. i looked at line number 1334 which the panic seems to > reference. i can't figure out what its doing though > > On Wed, May 28, 2008 at 4:54 PM, Alex Zhuravlev > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: > > Aaron Knister wrote: > > I'm seeing this bug (14465) under heavy load on my OSSes. If > I reboot the MDS it appears to help...any ideas? What's the > status on this bug? > > > could you attach your dmesg to the bug? as for the status - I'm > still not > able to reproduce this, neither I found possible cause, sorry. > > thanks, Alex > > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
