On Mon, 14 Jan 2008, Niklas Edmundsson wrote: > Lustre 1.6.4.1 on Ubuntu Dapper with Debian 2.6.18 AMD64 kernel. MDS > LBUG:ed with: > > -------------8<-------------------- > Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) > ASSERTION(inode->i_nlink == 1) failed:dir nlink == 0 > Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) > LBUG > Jan 12 10:39:40 Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) > showing stack for process 6198 > Jan 12 10:39:41 LustreError: dumping log to /tmp/lustre-log.1200130781.6198 > -------------8<--------------------
Ahem. It seems I got a little carried away with grep there and missed the stack trace. This should be more complete: ---------------8<--------------- Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) ASSERTION(inode->i_nlink == 1) failed:dir nlink == 0 Jan 12 10:39:40 LustreError: 6198:0:(mds_reint.c:1512:mds_orphan_add_link()) LBUG Jan 12 10:39:40 Lustre: 6198:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 6198 Jan 12 10:39:40 ll_mdt_22 R running task 0 6198 1 6199 6197 (L-TLB) Jan 12 10:39:40 343836365b3e343c 0036373832382e32 0000383338373433 0000000000000246 Jan 12 10:39:40 ffff8100f0697560 0000000000000018 343836365b3e303c 3030313132382e32 Jan 12 10:39:40 ffffffffff00205d ffff8100f06976fa ffff8101706976ef ffffffff805172e0 Jan 12 10:39:40 Call Trace: Jan 12 10:39:40 [<ffffffff80315c71>] vsnprintf+0x5b1/0x630 Jan 12 10:39:40 [<ffffffff8021d470>] physflat_send_IPI_mask+0x0/0x80 Jan 12 10:39:40 [<ffffffff802360ef>] vprintk+0x2ef/0x320 Jan 12 10:39:40 [<ffffffff8022b923>] __wake_up_common+0x43/0x80 Jan 12 10:39:40 [<ffffffff8022b923>] __wake_up_common+0x43/0x80 Jan 12 10:39:40 [<ffffffff8023616e>] printk+0x4e/0x60 Jan 12 10:39:40 [<ffffffff802360ef>] vprintk+0x2ef/0x320 Jan 12 10:39:40 [<ffffffff802360ef>] vprintk+0x2ef/0x320 Jan 12 10:39:40 [<ffffffff802360ef>] vprintk+0x2ef/0x320 Jan 12 10:39:40 [<ffffffff80256450>] kallsyms_lookup+0xf0/0x230 Jan 12 10:39:40 [<ffffffff80256450>] kallsyms_lookup+0xf0/0x230 Jan 12 10:39:40 [<ffffffff8020b090>] printk_address+0xb0/0xc0 Jan 12 10:39:40 [<ffffffff8023616e>] printk+0x4e/0x60 Jan 12 10:39:40 [<ffffffff80255c2a>] module_text_address+0x3a/0x50 Jan 12 10:39:40 [<ffffffff802491da>] kernel_text_address+0x1a/0x30 Jan 12 10:39:40 [<ffffffff802491da>] kernel_text_address+0x1a/0x30 Jan 12 10:39:40 [<ffffffff8020b4cc>] show_trace+0x21c/0x250 Jan 12 10:39:40 [<ffffffff8020b5ea>] _show_stack+0xea/0x100 Jan 12 10:39:40 [<ffffffff883f3a0a>] :libcfs:lbug_with_loc+0x7a/0xc0 Jan 12 10:39:40 [<ffffffff8871bb01>] :mds:mds_orphan_add_link+0x641/0x7e0 Jan 12 10:39:40 [<ffffffff883cabfd>] :ldiskfs:__ldiskfs_journal_stop+0x2d/0x60 Jan 12 10:39:40 [<ffffffff802cb55b>] dnotify_parent+0x2b/0xa0 Jan 12 10:39:40 [<ffffffff802a81a3>] dput+0x23/0x170 Jan 12 10:39:40 [<ffffffff8871d498>] :mds:mds_reint_unlink+0x17f8/0x25f0 Jan 12 10:39:40 [<ffffffff8850ec47>] :ptlrpc:ptlrpc_prep_set+0x2c7/0x360 Jan 12 10:39:40 [<ffffffff802a81a3>] dput+0x23/0x170 Jan 12 10:39:40 [<ffffffff8870f7b9>] :mds:mds_reint_rec+0x1d9/0x2b0 Jan 12 10:39:40 [<ffffffff887357cc>] :mds:mds_unlink_unpack+0x29c/0x3c0 Jan 12 10:39:40 [<ffffffff884e6f91>] :ptlrpc:ldlm_run_cp_ast_work+0x171/0x200 Jan 12 10:39:40 [<ffffffff88734624>] :mds:mds_update_unpack+0x214/0x2b0 Jan 12 10:39:40 [<ffffffff886ff971>] :mds:mds_reint+0x4b1/0x5a0 Jan 12 10:39:40 [<ffffffff885201cf>] :ptlrpc:lustre_msg_get_version+0x4f/0x100 Jan 12 10:39:40 [<ffffffff8870beea>] :mds:mds_handle+0x2fca/0x5f88 Jan 12 10:39:40 [<ffffffff884ff878>] :ptlrpc:ldlm_cli_cancel+0x298/0x2c0 Jan 12 10:39:40 [<ffffffff802899d0>] __drain_alien_cache+0x60/0x90 Jan 12 10:39:40 [<ffffffff8022e812>] find_busiest_group+0x252/0x6c0 Jan 12 10:39:40 [<ffffffff8848ae45>] :obdclass:class_handle2object+0xd5/0x160 Jan 12 10:39:40 [<ffffffff8851c480>] :ptlrpc:lustre_swab_ptlrpc_body+0x0/0x90 Jan 12 10:39:40 [<ffffffff88521155>] :ptlrpc:lustre_swab_buf+0xc5/0xf0 Jan 12 10:39:40 [<ffffffff8852710a>] :ptlrpc:ptlrpc_server_handle_request+0xc8a/0x1460 Jan 12 10:39:40 [<ffffffff80416d20>] thread_return+0x0/0x100 Jan 12 10:39:40 [<ffffffff8020df9e>] do_gettimeofday+0x5e/0xb0 Jan 12 10:39:40 [<ffffffff883fbf06>] :libcfs:lcw_update_time+0x16/0x100 Jan 12 10:39:40 [<ffffffff8023f309>] lock_timer_base+0x29/0x60 Jan 12 10:39:40 [<ffffffff8023f7f0>] __mod_timer+0xc0/0xf0 Jan 12 10:39:40 [<ffffffff8852933c>] :ptlrpc:ptlrpc_main+0x85c/0x9e0 Jan 12 10:39:40 [<ffffffff8022f490>] default_wake_function+0x0/0x10 Jan 12 10:39:40 [<ffffffff8020ac4c>] child_rip+0xa/0x12 Jan 12 10:39:41 [<ffffffff88528ae0>] :ptlrpc:ptlrpc_main+0x0/0x9e0 Jan 12 10:39:41 [<ffffffff8020ac42>] child_rip+0x0/0x12 Jan 12 10:39:41 Jan 12 10:39:41 LustreError: dumping log to /tmp/lustre-log.1200130781.6198 ---------------8<--------------- > I also have the lustre-log.1200130781.6198, but it seems to contain > binary data so I'll supply it only if it's needed. > > The following triggered the bug: > - mkdir rfiles > - in rfiles create 300000 files of random size 0-32k > - rm -rf rfiles & > - sleep 600 (ie. wait until you get bored and the rm isn't finished). > - rm -rf rfiles & > > This suggests that something isn't locked properly since two > concurrent rm's in a directory definitely shouldn't cause the MDS so > fall over... /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --------------------------------------------------------------------------- An Elephant Is Just A Mouse Built To Gov't Specs! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= _______________________________________________ Lustre-discuss mailing list Lustre-discuss@clusterfs.com https://mail.clusterfs.com/mailman/listinfo/lustre-discuss