Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

Janek Bevendorff Tue, 06 Aug 2019 00:38:45 -0700

Thanks that helps. Looks like the problem is that the MDS is not
automatically trimming its cache fast enough. Please try bumping
mds_cache_trim_threshold:

bin/ceph config set mds mds_cache_trim_threshold 512K

That did help. Somewhat. I removed the aggressive recall settings I setbefore and only set this option instead. The cache size seems to bequite stable now, although still increasing in the long run (but atleast not strictly monotonically).

However, now my client processes are basically in constant I/O waitstate and the CephFS is slow for everybody. After I restarted the copyjob, I got around 4k reqs/s and then it went down to 100 reqs/s witheverybody waiting their turn. So yes, it does seem to help, but itincreases latency by a magnitude.

As always, it would be great if these options were documented somewhere.Google has like five results, one of them being this thread. ;-)

Increase it further if it's not aggressive enough. Please let us know
if that helps.

It shouldn't be necessary to do this so I'll make a tracker ticket
once we confirm that's the issue.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

Reply via email to