[lustre-discuss] Object creation bottleneck

Nehring, Shane R [LAS] Wed, 03 Jun 2020 10:42:08 -0700

Hi all,

We have a small lustre environment (one mgs/mds, one mdt, two oss, four
osts ZFS backends on all targets) and occasionally we have issues with
user jobs that rapidly create thousands of files, which chokes up the
mds leading to poor performance of the FS for users (long wait times
for dir lists, file creation, etc). I've advised users to avoid this
sort of workflow when possible or to use local scratch storage when
not, but I'd like to lessen the impact as much as I can when it
happens.


When this occurs the mds processes have stack traces that look like:

[<ffffffffb7992d77>] call_rwsem_down_write_failed+0x17/0x30
[<ffffffffc16b3225>] lod_alloc_qos.constprop.18+0x205/0x1840 [lod]
[<ffffffffc16b9847>] lod_qos_prep_create+0x12d7/0x1890 [lod]
[<ffffffffc16ba015>] lod_prepare_create+0x215/0x2e0 [lod]
[<ffffffffc16a9e1e>] lod_declare_striped_create+0x1ee/0x980 [lod]
[<ffffffffc16ae6f4>] lod_declare_create+0x204/0x590 [lod]
[<ffffffffc1724ca2>] mdd_declare_create_object_internal+0xe2/0x2f0
[mdd]
[<ffffffffc17146dc>] mdd_declare_create+0x4c/0xcb0 [mdd]
[<ffffffffc1718067>] mdd_create+0x847/0x14e0 [mdd]
[<ffffffffc11cb5ff>] mdt_reint_open+0x224f/0x3240 [mdt]
[<ffffffffc11be693>] mdt_reint_rec+0x83/0x210 [mdt]
[<ffffffffc119b1b3>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
[<ffffffffc11a7a92>] mdt_intent_open+0x82/0x3a0 [mdt]
[<ffffffffc11a5bb5>] mdt_intent_policy+0x435/0xd80 [mdt]
[<ffffffffc1b8cd56>] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc]
[<ffffffffc1bb5366>] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc]
[<ffffffffc1c3db02>] tgt_enqueue+0x62/0x210 [ptlrpc]
[<ffffffffc1c442ea>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
[<ffffffffc1be929b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[<ffffffffc1becbfc>] ptlrpc_main+0xb2c/0x1460 [ptlrpc]
[<ffffffffb76c61f1>] kthread+0xd1/0xe0
[<ffffffffb7d8dd1d>] ret_from_fork_nospec_begin+0x7/0x21
[<ffffffffffffffff>] 0xffffffffffffffff

which to me implies that they're waiting on the OSTs to allocate
objects. The OSTs are each a ZFS span of mirrors. I've disabled sync on
the datasets, and set the osd-zfs parameters osd_object_sync_delay_us
and osd_txg_sync_delay_us to 0 (this FS is entirely scratch). Which has
improved things a bit, but we still have issues.

Does anyone have any pointers for improving OST performance for this
pathological use case?

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Object creation bottleneck

Reply via email to