Hi,
I'm still new to ceph. Here are similar problems with CephFS.
ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
on Debian GNU/Linux buster/sid
# ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
MDS_SLOW_REQUEST 1 MDSs report slow requests
mdsmds3(mds.0): 13 slow requests are blocked > 30 secs
MDS_TRIM 1 MDSs behind on trimming
mdsmds3(mds.0): Behind on trimming (33924/125) max_segments: 125,
num_segments: 33924
The workload is "doveadm backup" of more than 500 mail folders from a local
ext4 to a cephfs.
* There are ~180'000 files with a strange file size distribution:
# NumSamples = 181056; MIN_SEEN = 377; MAX_SEEN = 584835624
# Mean = 4477785.646005; Variance = 31526763457775.421875; SD = 5614869.852256
377 - 262502 [ 56652]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
31.29%
262502 - 524627 [ 4891]: ∎∎∎∎ 2.70%
524627 - 786752 [ 3498]: ∎∎∎ 1.93%
786752 - 1048878 [ 2770]: ∎∎∎ 1.53%
1048878 - 1311003 [ 2460]: ∎∎ 1.36%
1311003 - 1573128 [ 2197]: ∎∎ 1.21%
1573128 - 1835253 [ 2014]: ∎∎ 1.11%
1835253 - 2097378 [ 1961]: ∎∎ 1.08%
2097378 - 2359503 [ 2244]: ∎∎ 1.24%
2359503 - 2621628 [ 1890]: ∎∎ 1.04%
2621628 - 2883754 [ 1897]: ∎∎ 1.05%
2883754 - 3145879 [ 2188]: ∎∎ 1.21%
3145879 - 3408004 [ 2579]: ∎∎ 1.42%
3408004 - 3670129 [ 3396]: ∎∎∎ 1.88%
3670129 - 3932254 [ 5173]: ∎∎∎∎ 2.86%
3932254 - 4194379 [ 24847]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 13.72%
4194379 - 4456505 [ 1512]: ∎∎ 0.84%
4456505 - 4718630 [ 1394]: ∎∎ 0.77%
4718630 - 4980755 [ 1412]: ∎∎ 0.78%
4980755 - 584835624 [ 56081]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
30.97%
* There are two snapshots of the main directory the mails are backed up to.
* There are three sub directories where a simple ls doesn't return from.
* The cephfs is mounted using the kernel driver of Ubuntu 18.04.2 LTS kernel
4.15.0-48-generic.
* Same behaviour with ceph-fuse 'FUSE library version: 2.9.7' with the
difference that I can't interrupt the ls.
The reduction of the number of mds working for our cephfs to 1 made no
difference.
The number of segments is still rising.
# ceph -w
cluster:
id: 6cba13d1-b814-489c-9aac-9c04aaf78720
health: HEALTH_WARN
1 MDSs report slow requests
1 MDSs behind on trimming
services:
mon: 3 daemons, quorum mon1,mon2,mon3 (age 3d)
mgr: cephsible(active, since 27h), standbys: mon3, mon1
mds: cephfs_1:2 {0=mds3=up:active,1=mds2=up:stopping} 1 up:standby
osd: 30 osds: 30 up (since 4w), 30 in (since 5w)
data:
pools: 5 pools, 393 pgs
objects: 607.74k objects, 1.5 TiB
usage: 6.9 TiB used, 160 TiB / 167 TiB avail
pgs: 393 active+clean
2019-05-03 11:40:17.916193 mds.mds3 [WRN] 15 slow requests, 0 included below;
oldest blocked for > 342610.193367 secs
It seems the stopping of one out of two mds doesn't come to an end.
How to debug this?
Thanks in advance.
Lars
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com