I've got another odd crash, hoping somebody will have a hint:
Testing various configurations of our lustre/ramdisk stuff. I've come up with
a reproducible case where I'm tripping an assertion failure in the generic
transaction code. The proximate cause seems to be running iozone with a
stripe size of 1MB. The fs is striped 64K, so that ought to be turning into a
bunch of parallel writes.
I did some digging around and was all excited to find cfs bug 10891, because
it looked like it was fixing something like this, but it doesn't seem to make
a difference.
This is kernel 2.6.15, mips64, and still running lustre 1.6b7 (because the
vanilla kernel support still isn't ready in 1.6.x). I'm hopeful that somebody
will recognize this and be able to point me at a fix, perhaps in the real 1.6.
TIA...
[4344894.240000] Assertion failure in journal_dirty_metadata() at
fs/jbd/transaction.c:1117: "handle->h_buffer_credits > 0"
[4344894.246000] Break instruction in kernel code[#1]:
[4344894.248000] Cpu 4
[4344894.249000] $ 0 : 0000000000000000 000000001480c4e0 0000000000000082
ffffffff80528e88
[4344894.257000] $ 4 : ffffffff80528eb8 ffffffff80530000 ffffffff80530000
ffffffff805f0000
[4344894.261000] $ 8 : ffffffff80530000 ffffffff80530000 ffffffff80530000
ffffffff805f0000
[4344894.265000] $12 : ffffffff805f0000 ffffffff805f0000 ffffffff805f0000
ffffffff805f0000
[4344894.269000] $16 : a8000000e8050c80 a8000000e96b3e88 a8000000e96467e8
a8000000ef4616b0
[4344894.272000] $20 : a8000000fa013640 ffffffff805f0000 0000000000001800
a8000000ef4616b0
[4344894.278000] $24 : ffffffff80530000 ffffffff80530000
[4344894.282000] $28 : a8000000f9ec0000 a8000000f9ecf190 a8000000f9ecf2a0
ffffffff80241138
[4344894.290000] Hi : 0000000000000002
[4344894.292000] Lo : 0000000000000000
[4344894.293000] epc : ffffffff80241138 journal_dirty_metadata+0x170/0x598
Tainted: P
[4344894.297000] ra : ffffffff80241138 journal_dirty_metadata+0x170/0x598
[4344894.301000] Status: 1480c4e3 KX SX UX KERNEL EXL IE
[4344894.304000] Cause : 00800024
[4344894.305000] PrId : 040e1301
[4344894.306000] Modules linked in: osc mds fsfilt_ldiskfs mgs mgc lustre lov
mdc ksocklnd ksclnd ptlrpc obdclass lvfs ldiskfs lnet libcfs sceth scio scfab
[4344894.315000] Process ll_mdt_19 (pid: 1703, threadinfo=a8000000f9ec0000,
task=a8000000f02b4540)
[4344894.319000] Stack : a8000000e96b3e88 ffffffff805f0000 a8000000ef4616b0
a8000000e8e88168
[4344894.326000] a8000000f9ecf2c0 0000000000000000 c0000000028ec5a8
c0000000028ec474
[4344894.330000] 0000000000000060 ffffffff8023e298 0000000000000040
0000000000000040
[4344894.334000] 00000000000072bf a800000011f3fae0 a800000011cc3f40
c000000000070000
[4344894.338000] a8000000f9ecf220 c000000002a9bf48 0000000000000000
a8000000ef4e3180
[4344894.341000] 0000000000000000 a8000000f9ecf2c0 a8000000e96b3e30
0000000000000800
[4344894.347000] a8000000e96c7800 0000000000001800 c0000000028eeef8
0000000000000000
[4344894.351000] a8000000e96b3e30 0000000000000c00 0000000000000000
0000000000000000
[4344894.357000] c000000008ecbefc a8000000e9316000 a800000011f3fae0
a8000000fa239700
[4344894.360000] a8000000f9ecf2c0 c000000002aa08f8 ffffffff00000000
0000000000000005
[4344894.364000] ...
[4344894.366000] Call Trace:
[4344894.367000] [<c0000000028ec5a8>] ldiskfs_getblk+0x278/0x668 [ldiskfs]
[4344894.370000] [<c0000000028ec474>] ldiskfs_getblk+0x144/0x668 [ldiskfs]
[4344894.372000] [<ffffffff8023e298>] do_get_write_access+0xa40/0xdb8
[4344894.375000] [<c000000002a9bf48>] llog_lvfs_write_blob+0x4f8/0xc08
[obdclass]
[4344894.378000] [<c0000000028eeef8>] ldiskfs_bread+0x30/0x1e8 [ldiskfs]
[4344894.381000] [<c000000008ecbefc>] fsfilt_ldiskfs_write_record+0x22c/0xa68
[fsfilt_ldiskfs]
[4344894.386000] [<c000000002aa08f8>] llog_lvfs_write_rec+0x1458/0x2918
[obdclass]
[4344894.389000] [<c000000002a9ba50>] llog_lvfs_write_blob+0x0/0xc08 [obdclass]
[4344894.392000] [<c000000002a9bf48>] llog_lvfs_write_blob+0x4f8/0xc08
[obdclass]
[4344894.397000] [<c000000002a9379c>] llog_cat_new_log+0xff4/0x2350 [obdclass]
[4344894.400000] [<c000000002a9ba50>] llog_lvfs_write_blob+0x0/0xc08 [obdclass]
[4344894.403000] [<c000000002aa0004>] llog_lvfs_write_rec+0xb64/0x2918
[obdclass]
[4344894.411000] [<ffffffff802b0000>] kobject_register+0x80/0xa0
[4344894.413000] [<ffffffff8013b650>] __might_sleep+0x0/0x158
[4344894.415000] [<ffffffff804513bc>] _spin_unlock_irqrestore+0x14/0x48
[4344894.418000] [<c000000002a9788c>] llog_cat_current_log+0x82c/0xb98
[obdclass]
[4344894.420000] [<ffffffff804514c8>] _spin_lock_irqsave+0x30/0x48
[4344894.423000] [<ffffffff802b3548>] __up_write+0x40/0x2e8
[4344894.425000] [<c000000002a90000>] llog_process+0x1f30/0x2e80 [obdclass]
[4344894.430000] [<c000000002a98350>] llog_cat_add_rec+0x1a0/0x1530 [obdclass]
[4344894.433000] [<c0000000028f0000>] ldiskfs_prepare_write+0x188/0x290
[ldiskfs]
[4344894.436000] [<c0000000028eae14>] ldiskfs_get_block_handle+0x104/0x12f8
[ldiskfs]
[4344894.441000] [<c000000002aada54>] llog_obd_origin_add+0xec/0x5c0 [obdclass]
[4344894.444000] [<ffffffff801ba884>] __find_get_block+0x1bc/0x418
[4344894.447000] [<c0000000037d06b4>] _ldlm_lock_debug+0x1d4/0x838 [ptlrpc]
[4344894.456000] [<c000000002aaa5cc>] llog_add+0xfc/0x960 [obdclass]
[4344894.459000] [<c000000008dd0d88>] lov_llog_origin_add+0x168/0x5b8 [lov]
[4344894.461000] [<ffffffff801bcaa0>] ll_rw_block+0x0/0x450
[4344894.464000] [<c0000000028ec474>] ldiskfs_getblk+0x144/0x668 [ldiskfs]
[4344894.466000] [<ffffffff801c0000>] end_buffer_read_sync+0x190/0x228
[4344894.469000] [<c000000002aaa5cc>] llog_add+0xfc/0x960 [obdclass]
[4344894.471000] [<c000000008e1cc68>] lov_alloc_memmd+0x178/0xf60 [lov]
[4344894.478000] [<c000000008f70a4c>] mds_llog_origin_add+0x10c/0x430 [mds]
[4344894.481000] [<c000000002aaa5cc>] llog_add+0xfc/0x960 [obdclass]
[4344894.483000] [<ffffffff802b0000>] kobject_register+0x80/0xa0
[4344894.490000] [<c000000008f77660>] mds_log_op_unlink+0x19f0/0x3cc0 [mds]
[4344894.493000] [<c000000008f774bc>] mds_log_op_unlink+0x184c/0x3cc0 [mds]
[4344894.495000] [<c0000000028ffcc8>] __ldiskfs_journal_stop+0x48/0xa0
[ldiskfs]
[4344894.498000] [<ffffffff8020baf0>] dnotify_parent+0x60/0x198
[4344894.502000] [<c0000000028f8b88>] ldiskfs_unlink+0xf8/0x340 [ldiskfs]
[4344894.505000] [<c000000003874730>] lustre_msg_buf+0x0/0x158 [ptlrpc]
[4344894.508000] [<c000000003883030>] lustre_msg_buflen+0x0/0x1f8 [ptlrpc]
[4344894.513000] [<c000000008ffa91c>] mds_reint_unlink+0x3aec/0x65a0 [mds]
[4344894.516000] [<c000000002a5cf70>] upcall_cache_get_entry+0x14b8/0x1fe8
[lvfs]
[4344894.518000] [<ffffffff801e3c4c>] dput+0x9c/0x3d0
[4344894.525000] [<c000000008fd921c>] mds_reint_rec+0x1fc/0x758 [mds]
[4344894.527000] [<c0000000090498f8>] mds_unlink_unpack+0x1c8/0x838 [mds]
[4344894.530000] [<c0000000090498cc>] mds_unlink_unpack+0x19c/0x838 [mds]
[4344894.533000] [<c000000008facf80>] mds_reint+0x568/0xda8 [mds]
[4344894.539000] [<c000000003870000>] lustre_msg_buf_v2+0x118/0x4d0 [ptlrpc]
[4344894.542000] [<c000000008fc1830>] mds_handle+0x5ed8/0x12dc0 [mds]
[4344894.545000] [<ffffffff8010c99c>] do_gettimeofday+0xf4/0x220
[4344894.551000] [<c000000000065e58>] libcfs_debug_vmsg2+0x330/0xfd0 [libcfs]
[4344894.554000] [<c000000000065c00>] libcfs_debug_vmsg2+0xd8/0xfd0 [libcfs]
[4344894.557000] [<c000000003895c24>] ptlrpc_main+0x352c/0x48f8 [ptlrpc]
[4344894.564000] [<c000000003888fc0>] ptlrpc_retry_rqbds+0x0/0x10 [ptlrpc]
[4344894.567000] [<c000000003888fc0>] ptlrpc_retry_rqbds+0x0/0x10 [ptlrpc]
[4344894.569000] [<ffffffff8013cde8>] default_wake_function+0x0/0x20
[4344894.572000] [<ffffffff80108680>] kernel_thread_helper+0x10/0x18
[4344894.579000] [<ffffffff80108670>] kernel_thread_helper+0x0/0x18
[4344894.582000]
[4344894.582000]
[4344894.583000] Code: 2407045d 0c051fac 0102402d <0200000d> 8e620008
2442ffff ae620008 de420028 120200b7
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss