I've got another odd crash, hoping somebody will have a hint:

Testing various configurations of our lustre/ramdisk stuff.  I've come up with
a reproducible case where I'm tripping an assertion failure in the generic
transaction code.  The proximate cause seems to be running iozone with a
stripe size of 1MB.  The fs is striped 64K, so that ought to be turning into a
bunch of parallel writes.

I did some digging around and was all excited to find cfs bug 10891, because
it looked like it was fixing something like this, but it doesn't seem to make
a difference.

This is kernel 2.6.15, mips64, and still running lustre 1.6b7 (because the
vanilla kernel support still isn't ready in 1.6.x).  I'm hopeful that somebody
will recognize this and be able to point me at a fix, perhaps in the real 1.6.

TIA...



[4344894.240000] Assertion failure in journal_dirty_metadata() at 
fs/jbd/transaction.c:1117: "handle->h_buffer_credits > 0"
[4344894.246000] Break instruction in kernel code[#1]:
[4344894.248000] Cpu 4
[4344894.249000] $ 0   : 0000000000000000 000000001480c4e0 0000000000000082 
ffffffff80528e88
[4344894.257000] $ 4   : ffffffff80528eb8 ffffffff80530000 ffffffff80530000 
ffffffff805f0000
[4344894.261000] $ 8   : ffffffff80530000 ffffffff80530000 ffffffff80530000 
ffffffff805f0000
[4344894.265000] $12   : ffffffff805f0000 ffffffff805f0000 ffffffff805f0000 
ffffffff805f0000
[4344894.269000] $16   : a8000000e8050c80 a8000000e96b3e88 a8000000e96467e8 
a8000000ef4616b0
[4344894.272000] $20   : a8000000fa013640 ffffffff805f0000 0000000000001800 
a8000000ef4616b0
[4344894.278000] $24   : ffffffff80530000 ffffffff80530000                      
            
[4344894.282000] $28   : a8000000f9ec0000 a8000000f9ecf190 a8000000f9ecf2a0 
ffffffff80241138
[4344894.290000] Hi    : 0000000000000002
[4344894.292000] Lo    : 0000000000000000
[4344894.293000] epc   : ffffffff80241138 journal_dirty_metadata+0x170/0x598    
 Tainted: P     
[4344894.297000] ra    : ffffffff80241138 journal_dirty_metadata+0x170/0x598
[4344894.301000] Status: 1480c4e3    KX SX UX KERNEL EXL IE 
[4344894.304000] Cause : 00800024
[4344894.305000] PrId  : 040e1301
[4344894.306000] Modules linked in: osc mds fsfilt_ldiskfs mgs mgc lustre lov 
mdc ksocklnd ksclnd ptlrpc obdclass lvfs ldiskfs lnet libcfs sceth scio scfab
[4344894.315000] Process ll_mdt_19 (pid: 1703, threadinfo=a8000000f9ec0000, 
task=a8000000f02b4540)
[4344894.319000] Stack : a8000000e96b3e88 ffffffff805f0000 a8000000ef4616b0 
a8000000e8e88168
[4344894.326000]         a8000000f9ecf2c0 0000000000000000 c0000000028ec5a8 
c0000000028ec474
[4344894.330000]         0000000000000060 ffffffff8023e298 0000000000000040 
0000000000000040
[4344894.334000]         00000000000072bf a800000011f3fae0 a800000011cc3f40 
c000000000070000
[4344894.338000]         a8000000f9ecf220 c000000002a9bf48 0000000000000000 
a8000000ef4e3180
[4344894.341000]         0000000000000000 a8000000f9ecf2c0 a8000000e96b3e30 
0000000000000800
[4344894.347000]         a8000000e96c7800 0000000000001800 c0000000028eeef8 
0000000000000000
[4344894.351000]         a8000000e96b3e30 0000000000000c00 0000000000000000 
0000000000000000
[4344894.357000]         c000000008ecbefc a8000000e9316000 a800000011f3fae0 
a8000000fa239700
[4344894.360000]         a8000000f9ecf2c0 c000000002aa08f8 ffffffff00000000 
0000000000000005
[4344894.364000]         ...
[4344894.366000] Call Trace:
[4344894.367000]  [<c0000000028ec5a8>] ldiskfs_getblk+0x278/0x668 [ldiskfs]
[4344894.370000]  [<c0000000028ec474>] ldiskfs_getblk+0x144/0x668 [ldiskfs]
[4344894.372000]  [<ffffffff8023e298>] do_get_write_access+0xa40/0xdb8
[4344894.375000]  [<c000000002a9bf48>] llog_lvfs_write_blob+0x4f8/0xc08 
[obdclass]
[4344894.378000]  [<c0000000028eeef8>] ldiskfs_bread+0x30/0x1e8 [ldiskfs]
[4344894.381000]  [<c000000008ecbefc>] fsfilt_ldiskfs_write_record+0x22c/0xa68 
[fsfilt_ldiskfs]
[4344894.386000]  [<c000000002aa08f8>] llog_lvfs_write_rec+0x1458/0x2918 
[obdclass]
[4344894.389000]  [<c000000002a9ba50>] llog_lvfs_write_blob+0x0/0xc08 [obdclass]
[4344894.392000]  [<c000000002a9bf48>] llog_lvfs_write_blob+0x4f8/0xc08 
[obdclass]
[4344894.397000]  [<c000000002a9379c>] llog_cat_new_log+0xff4/0x2350 [obdclass]
[4344894.400000]  [<c000000002a9ba50>] llog_lvfs_write_blob+0x0/0xc08 [obdclass]
[4344894.403000]  [<c000000002aa0004>] llog_lvfs_write_rec+0xb64/0x2918 
[obdclass]
[4344894.411000]  [<ffffffff802b0000>] kobject_register+0x80/0xa0
[4344894.413000]  [<ffffffff8013b650>] __might_sleep+0x0/0x158
[4344894.415000]  [<ffffffff804513bc>] _spin_unlock_irqrestore+0x14/0x48
[4344894.418000]  [<c000000002a9788c>] llog_cat_current_log+0x82c/0xb98 
[obdclass]
[4344894.420000]  [<ffffffff804514c8>] _spin_lock_irqsave+0x30/0x48
[4344894.423000]  [<ffffffff802b3548>] __up_write+0x40/0x2e8
[4344894.425000]  [<c000000002a90000>] llog_process+0x1f30/0x2e80 [obdclass]
[4344894.430000]  [<c000000002a98350>] llog_cat_add_rec+0x1a0/0x1530 [obdclass]
[4344894.433000]  [<c0000000028f0000>] ldiskfs_prepare_write+0x188/0x290 
[ldiskfs]
[4344894.436000]  [<c0000000028eae14>] ldiskfs_get_block_handle+0x104/0x12f8 
[ldiskfs]
[4344894.441000]  [<c000000002aada54>] llog_obd_origin_add+0xec/0x5c0 [obdclass]
[4344894.444000]  [<ffffffff801ba884>] __find_get_block+0x1bc/0x418
[4344894.447000]  [<c0000000037d06b4>] _ldlm_lock_debug+0x1d4/0x838 [ptlrpc]
[4344894.456000]  [<c000000002aaa5cc>] llog_add+0xfc/0x960 [obdclass]
[4344894.459000]  [<c000000008dd0d88>] lov_llog_origin_add+0x168/0x5b8 [lov]
[4344894.461000]  [<ffffffff801bcaa0>] ll_rw_block+0x0/0x450
[4344894.464000]  [<c0000000028ec474>] ldiskfs_getblk+0x144/0x668 [ldiskfs]
[4344894.466000]  [<ffffffff801c0000>] end_buffer_read_sync+0x190/0x228
[4344894.469000]  [<c000000002aaa5cc>] llog_add+0xfc/0x960 [obdclass]
[4344894.471000]  [<c000000008e1cc68>] lov_alloc_memmd+0x178/0xf60 [lov]
[4344894.478000]  [<c000000008f70a4c>] mds_llog_origin_add+0x10c/0x430 [mds]
[4344894.481000]  [<c000000002aaa5cc>] llog_add+0xfc/0x960 [obdclass]
[4344894.483000]  [<ffffffff802b0000>] kobject_register+0x80/0xa0
[4344894.490000]  [<c000000008f77660>] mds_log_op_unlink+0x19f0/0x3cc0 [mds]
[4344894.493000]  [<c000000008f774bc>] mds_log_op_unlink+0x184c/0x3cc0 [mds]
[4344894.495000]  [<c0000000028ffcc8>] __ldiskfs_journal_stop+0x48/0xa0 
[ldiskfs]
[4344894.498000]  [<ffffffff8020baf0>] dnotify_parent+0x60/0x198
[4344894.502000]  [<c0000000028f8b88>] ldiskfs_unlink+0xf8/0x340 [ldiskfs]
[4344894.505000]  [<c000000003874730>] lustre_msg_buf+0x0/0x158 [ptlrpc]
[4344894.508000]  [<c000000003883030>] lustre_msg_buflen+0x0/0x1f8 [ptlrpc]
[4344894.513000]  [<c000000008ffa91c>] mds_reint_unlink+0x3aec/0x65a0 [mds]
[4344894.516000]  [<c000000002a5cf70>] upcall_cache_get_entry+0x14b8/0x1fe8 
[lvfs]
[4344894.518000]  [<ffffffff801e3c4c>] dput+0x9c/0x3d0
[4344894.525000]  [<c000000008fd921c>] mds_reint_rec+0x1fc/0x758 [mds]
[4344894.527000]  [<c0000000090498f8>] mds_unlink_unpack+0x1c8/0x838 [mds]
[4344894.530000]  [<c0000000090498cc>] mds_unlink_unpack+0x19c/0x838 [mds]
[4344894.533000]  [<c000000008facf80>] mds_reint+0x568/0xda8 [mds]
[4344894.539000]  [<c000000003870000>] lustre_msg_buf_v2+0x118/0x4d0 [ptlrpc]
[4344894.542000]  [<c000000008fc1830>] mds_handle+0x5ed8/0x12dc0 [mds]
[4344894.545000]  [<ffffffff8010c99c>] do_gettimeofday+0xf4/0x220
[4344894.551000]  [<c000000000065e58>] libcfs_debug_vmsg2+0x330/0xfd0 [libcfs]
[4344894.554000]  [<c000000000065c00>] libcfs_debug_vmsg2+0xd8/0xfd0 [libcfs]
[4344894.557000]  [<c000000003895c24>] ptlrpc_main+0x352c/0x48f8 [ptlrpc]
[4344894.564000]  [<c000000003888fc0>] ptlrpc_retry_rqbds+0x0/0x10 [ptlrpc]
[4344894.567000]  [<c000000003888fc0>] ptlrpc_retry_rqbds+0x0/0x10 [ptlrpc]
[4344894.569000]  [<ffffffff8013cde8>] default_wake_function+0x0/0x20
[4344894.572000]  [<ffffffff80108680>] kernel_thread_helper+0x10/0x18
[4344894.579000]  [<ffffffff80108670>] kernel_thread_helper+0x0/0x18
[4344894.582000] 
[4344894.582000] 
[4344894.583000] Code: 2407045d  0c051fac  0102402d <0200000d> 8e620008  
2442ffff  ae620008  de420028  120200b7 

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to