On Wed, May 8, 2019 at 4:10 AM Stolte, Felix <[email protected]> wrote:
>
> Hi folks,
>
> we are running a luminous cluster and using the cephfs for fileservices. We 
> use Tivoli Storage Manager to backup all data in the ceph filesystem to tape 
> for disaster recovery. Backup runs on two dedicated servers, which mounted 
> the cephfs via kernel mount. In order to complete the Backup in time we are 
> using 60 Backup Threads per Server. While backup is running, ceph health 
> often changes from “OK” to “2 clients failing to respond to cache pressure”. 
> After investigating and doing research in the mailing list I set the 
> following parameters:
>
> mds_cache_memory_limit = 34359738368 (32 GB) on MDS Server
>
> client_oc_size = 104857600 (100 MB, default is 200 MB) on Backup Servers
>
> All Servers running Ubuntu 18.04 with Kernel 4.15.0-47 and ceph 12.2.11. We 
> have 3 MDS Servers, 1 Active, 2 Standby. Changing to multiple active MDS 
> Servers is not an option, since we are planning to use snapshots. Cephfs 
> holds 78,815,975 files.
>
> Any advice on getting rid of the Warning would be very much appreciated. On a 
> sidenote: Although MDS Cache Memory is set to 32GB htop shows 60GB Memory 
> Usage for the ceph-mds process

With clients doing backup it's likely that they hold millions of caps.
This is not a good situation to be in. I recommend upgrading to
12.2.12 as we recently backported a fix for the MDS to limit the
number of caps held by clients to 1M. Additionally, trimming the cache
and recalling caps is now throttled. This may help a lot for your
workload.

Note that these fixes haven't been backported to Mimic yet.

-- 
Patrick Donnelly
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to