Hi all, We have a small lustre environment (one mgs/mds, one mdt, two oss, four osts ZFS backends on all targets) and occasionally we have issues with user jobs that rapidly create thousands of files, which chokes up the mds leading to poor performance of the FS for users (long wait times for dir lists, file creation, etc). I've advised users to avoid this sort of workflow when possible or to use local scratch storage when not, but I'd like to lessen the impact as much as I can when it happens.
When this occurs the mds processes have stack traces that look like: [<ffffffffb7992d77>] call_rwsem_down_write_failed+0x17/0x30 [<ffffffffc16b3225>] lod_alloc_qos.constprop.18+0x205/0x1840 [lod] [<ffffffffc16b9847>] lod_qos_prep_create+0x12d7/0x1890 [lod] [<ffffffffc16ba015>] lod_prepare_create+0x215/0x2e0 [lod] [<ffffffffc16a9e1e>] lod_declare_striped_create+0x1ee/0x980 [lod] [<ffffffffc16ae6f4>] lod_declare_create+0x204/0x590 [lod] [<ffffffffc1724ca2>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd] [<ffffffffc17146dc>] mdd_declare_create+0x4c/0xcb0 [mdd] [<ffffffffc1718067>] mdd_create+0x847/0x14e0 [mdd] [<ffffffffc11cb5ff>] mdt_reint_open+0x224f/0x3240 [mdt] [<ffffffffc11be693>] mdt_reint_rec+0x83/0x210 [mdt] [<ffffffffc119b1b3>] mdt_reint_internal+0x6e3/0xaf0 [mdt] [<ffffffffc11a7a92>] mdt_intent_open+0x82/0x3a0 [mdt] [<ffffffffc11a5bb5>] mdt_intent_policy+0x435/0xd80 [mdt] [<ffffffffc1b8cd56>] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [<ffffffffc1bb5366>] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [<ffffffffc1c3db02>] tgt_enqueue+0x62/0x210 [ptlrpc] [<ffffffffc1c442ea>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [<ffffffffc1be929b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [<ffffffffc1becbfc>] ptlrpc_main+0xb2c/0x1460 [ptlrpc] [<ffffffffb76c61f1>] kthread+0xd1/0xe0 [<ffffffffb7d8dd1d>] ret_from_fork_nospec_begin+0x7/0x21 [<ffffffffffffffff>] 0xffffffffffffffff which to me implies that they're waiting on the OSTs to allocate objects. The OSTs are each a ZFS span of mirrors. I've disabled sync on the datasets, and set the osd-zfs parameters osd_object_sync_delay_us and osd_txg_sync_delay_us to 0 (this FS is entirely scratch). Which has improved things a bit, but we still have issues. Does anyone have any pointers for improving OST performance for this pathological use case?
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
