On Wed, May 8, 2019 at 4:10 AM Stolte, Felix <[email protected]> wrote: > > Hi folks, > > we are running a luminous cluster and using the cephfs for fileservices. We > use Tivoli Storage Manager to backup all data in the ceph filesystem to tape > for disaster recovery. Backup runs on two dedicated servers, which mounted > the cephfs via kernel mount. In order to complete the Backup in time we are > using 60 Backup Threads per Server. While backup is running, ceph health > often changes from “OK” to “2 clients failing to respond to cache pressure”. > After investigating and doing research in the mailing list I set the > following parameters: > > mds_cache_memory_limit = 34359738368 (32 GB) on MDS Server > > client_oc_size = 104857600 (100 MB, default is 200 MB) on Backup Servers > > All Servers running Ubuntu 18.04 with Kernel 4.15.0-47 and ceph 12.2.11. We > have 3 MDS Servers, 1 Active, 2 Standby. Changing to multiple active MDS > Servers is not an option, since we are planning to use snapshots. Cephfs > holds 78,815,975 files. > > Any advice on getting rid of the Warning would be very much appreciated. On a > sidenote: Although MDS Cache Memory is set to 32GB htop shows 60GB Memory > Usage for the ceph-mds process
With clients doing backup it's likely that they hold millions of caps. This is not a good situation to be in. I recommend upgrading to 12.2.12 as we recently backported a fix for the MDS to limit the number of caps held by clients to 1M. Additionally, trimming the cache and recalling caps is now throttled. This may help a lot for your workload. Note that these fixes haven't been backported to Mimic yet. -- Patrick Donnelly _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
