Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

Janek Bevendorff Thu, 25 Jul 2019 00:46:52 -0700

It's possible the MDS is not being aggressive enough with asking the
single (?) client to reduce its cache size. There were recent changes
[1] to the MDS to improve this. However, the defaults may not be
aggressive enough for your client's workload. Can you try:

ceph config set mds mds_recall_max_caps 10000
ceph config set mds mds_recall_max_decay_rate 1.0

Thank you. I was looking for config directives that do exactly this allweek. Why are they not documented anywhere outside that blog post?

I added them as you described and the MDS seems to have stabilized andstays just under 1M inos now. I will continue to monitor it and see ifit is working in the long run. Settings like these should be the defaultIMHO. Clients should never be able to crash the server just by holdingonto their capabilities. If a server decides to drop things from itscache, clients must deal with it. Everything else threatens thestability of the system (and may even prevent the MDS from ever startingagain, as we saw).

Also your other mailings made me think you may still be using the old
inode limit for the cache size. Are you using the new
mds_cache_memory_limit config option?

No, I am not. I tried it at some point to see if it made things better,but just like the memory cache limit, it seemed to have no effectwhatsoever except for delaying the health warning.

Finally, if this fixes your issue (please let us know!) and you decide
to try multiple active MDS, you should definitely use pinning as the
parallel create workload will greatly benefit from it.


I will try that, although I directory tree is quite imbalanced.


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

Reply via email to