[ceph-users] Re: Experience with high mds_max_caps_per_client

Eugen Block Fri, 11 Jul 2025 03:46:44 -0700

Hi Kasper,

that's exactly what we usually do if we have identified somemisbehavior, trying to find the right setting to mitigate the issue.If you see cache pressure messages, it might be more helpful to ratherdecrease mds_recall_max_caps (default: 30000) than to increase it(your setting is 33000). Mykola once helped explaining that [0], maybethis could help here as well. I can't recall having tweakedmds_max_caps_per_client myself yet, but yeah, I would try to makesense of the settings and the observed behavior.


Regards,
Eugen

[0] https://tracker.ceph.com/issues/57115

Zitat von Kasper Rasmussen <kasper_steenga...@hotmail.com>:

I have a CephFS where workloads use many small files.
I see cache pressure / MDS_CLIENT_RECALL warnings ones in awhile(due to clients exceeding mds_max_caps_per_client) and it seems ifthey linger to long, it ends up with more warnings e.g.MDS_SLOW_REQUESTS, and some directories locks up.
Anyway - Currently I have the mds_max_caps_per_client set to 2Mwhich by looking into the output of
sudo ceph tell mds.<name> counter dump
..
..
"counters": {
"cap_hits": 8122912454,
"cap_miss": 497593,
"avg_read_latency": 0.000000028,
"avg_write_latency": 0.000000000,
"avg_metadata_latency": 0.000000000,
"dentry_lease_hits": 5630994071,
"dentry_lease_miss": 174816044,
"opened_files": 65,
"opened_inodes": 2106823,
"pinned_icaps": 2106823,
"total_inodes": 2106823,
"total_read_ops": 309938,
"total_read_size": 191662499168,
"total_write_ops": 371242,
"total_write_size": 414398493835
..
..

It Is not enough. However there is not a lot of open files
Checking the "ceph_mds_client_metrics_<fs_name>_pinned_icaps" gaugein prometheus tells the same story.. The client is constantly hiddenthe max caps roof over days (clients have long running jobs)
Can anyone share experience in regards to changingmds_max_caps_per_client to accomodate for such workloads.When changing this, should other config variables be taken intoaccount like -
mds_cache_memory_limit - currently: 36GB
mds_cache_trim_decay_rate - currently: 0.9
mds_cache_trim_threshold - currently: 288358
mds_recall_max_caps - currently: 33000
mds_recall_max_decay_rate - currently: 1.35

Or should the be tuned on a observe-and-change-as-needed basis.

Thanks in advance.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Experience with high mds_max_caps_per_client

Reply via email to