Hi Kasper,
that's exactly what we usually do if we have identified some
misbehavior, trying to find the right setting to mitigate the issue.
If you see cache pressure messages, it might be more helpful to rather
decrease mds_recall_max_caps (default: 30000) than to increase it
(your setting is 33000). Mykola once helped explaining that [0], maybe
this could help here as well. I can't recall having tweaked
mds_max_caps_per_client myself yet, but yeah, I would try to make
sense of the settings and the observed behavior.
Regards,
Eugen
[0] https://tracker.ceph.com/issues/57115
Zitat von Kasper Rasmussen <kasper_steenga...@hotmail.com>:
I have a CephFS where workloads use many small files.
I see cache pressure / MDS_CLIENT_RECALL warnings ones in awhile
(due to clients exceeding mds_max_caps_per_client) and it seems if
they linger to long, it ends up with more warnings e.g.
MDS_SLOW_REQUESTS, and some directories locks up.
Anyway - Currently I have the mds_max_caps_per_client set to 2M
which by looking into the output of
sudo ceph tell mds.<name> counter dump
..
..
"counters": {
"cap_hits": 8122912454,
"cap_miss": 497593,
"avg_read_latency": 0.000000028,
"avg_write_latency": 0.000000000,
"avg_metadata_latency": 0.000000000,
"dentry_lease_hits": 5630994071,
"dentry_lease_miss": 174816044,
"opened_files": 65,
"opened_inodes": 2106823,
"pinned_icaps": 2106823,
"total_inodes": 2106823,
"total_read_ops": 309938,
"total_read_size": 191662499168,
"total_write_ops": 371242,
"total_write_size": 414398493835
..
..
It Is not enough. However there is not a lot of open files
Checking the "ceph_mds_client_metrics_<fs_name>_pinned_icaps" gauge
in prometheus tells the same story.. The client is constantly hidden
the max caps roof over days (clients have long running jobs)
Can anyone share experience in regards to changing
mds_max_caps_per_client to accomodate for such workloads.
When changing this, should other config variables be taken into
account like -
mds_cache_memory_limit - currently: 36GB
mds_cache_trim_decay_rate - currently: 0.9
mds_cache_trim_threshold - currently: 288358
mds_recall_max_caps - currently: 33000
mds_recall_max_decay_rate - currently: 1.35
Or should the be tuned on a observe-and-change-as-needed basis.
Thanks in advance.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io