Hi Kasper,

that's exactly what we usually do if we have identified some misbehavior, trying to find the right setting to mitigate the issue. If you see cache pressure messages, it might be more helpful to rather decrease mds_recall_max_caps (default: 30000) than to increase it (your setting is 33000). Mykola once helped explaining that [0], maybe this could help here as well. I can't recall having tweaked mds_max_caps_per_client myself yet, but yeah, I would try to make sense of the settings and the observed behavior.

Regards,
Eugen

[0] https://tracker.ceph.com/issues/57115

Zitat von Kasper Rasmussen <kasper_steenga...@hotmail.com>:

I have a CephFS where workloads use many small files.
I see cache pressure / MDS_CLIENT_RECALL warnings ones in awhile (due to clients exceeding mds_max_caps_per_client) and it seems if they linger to long, it ends up with more warnings e.g. MDS_SLOW_REQUESTS, and some directories locks up.

Anyway - Currently I have the mds_max_caps_per_client set to 2M which by looking into the output of

sudo ceph tell mds.<name> counter dump
..
..
"counters": {
"cap_hits": 8122912454,
"cap_miss": 497593,
"avg_read_latency": 0.000000028,
"avg_write_latency": 0.000000000,
"avg_metadata_latency": 0.000000000,
"dentry_lease_hits": 5630994071,
"dentry_lease_miss": 174816044,
"opened_files": 65,
"opened_inodes": 2106823,
"pinned_icaps": 2106823,
"total_inodes": 2106823,
"total_read_ops": 309938,
"total_read_size": 191662499168,
"total_write_ops": 371242,
"total_write_size": 414398493835
..
..

It Is not enough. However there is not a lot of open files

Checking the "ceph_mds_client_metrics_<fs_name>_pinned_icaps" gauge in prometheus tells the same story.. The client is constantly hidden the max caps roof over days (clients have long running jobs)


Can anyone share experience in regards to changing mds_max_caps_per_client to accomodate for such workloads. When changing this, should other config variables be taken into account like -

mds_cache_memory_limit - currently: 36GB
mds_cache_trim_decay_rate - currently: 0.9
mds_cache_trim_threshold - currently: 288358
mds_recall_max_caps - currently: 33000
mds_recall_max_decay_rate - currently: 1.35

Or should the be tuned on a observe-and-change-as-needed basis.

Thanks in advance.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to