Hi,
We have upgraded one ceph cluster from 17.2.7 to 18.2.0. Since then we are
having CephFS issues.
For example this morning:
“””
[root@naret-monitor01 ~]# ceph -s
cluster:
id: 63334166-d991-11eb-99de-40a6b72108d0
health: HEALTH_WARN
1 filesystem is degraded
3 clients failing to advance oldest client/flush tid
3 MDSs report slow requests
6 pgs not scrubbed in time
29 daemons have recently crashed
…
“””
The ceph orch, ceph crash and ceph fs status commands were hanging.
After a “ceph mgr fail” those commands started to respond.
Then I have noticed that there was one mds with most of the slow operations,
“””
[WRN] MDS_SLOW_REQUEST: 3 MDSs report slow requests
mds.cephfs.naret-monitor01.nuakzo(mds.0): 18 slow requests are blocked > 30
secs
mds.cephfs.naret-monitor01.uvevbf(mds.1): 1683 slow requests are blocked >
30 secs
mds.cephfs.naret-monitor02.exceuo(mds.2): 1 slow requests are blocked > 30
secs
“””
Then I tried to restart it with
“””
[root@naret-monitor01 ~]# ceph orch daemon restart
mds.cephfs.naret-monitor01.uvevbf
Scheduled to restart mds.cephfs.naret-monitor01.uvevbf on host 'naret-monitor01'
“””
After the cephfs entered into this situation:
“””
[root@naret-monitor01 ~]# ceph fs status
cephfs - 198 clients
======
RANK STATE MDS ACTIVITY DNS INOS
DIRS CAPS
0 active cephfs.naret-monitor01.nuakzo Reqs: 0 /s 17.2k 16.2k
1892 14.3k
1 active cephfs.naret-monitor02.ztdghf Reqs: 0 /s 28.1k 10.3k
752 6881
2 clientreplay cephfs.naret-monitor02.exceuo 63.0k 6491
541 66
3 active cephfs.naret-monitor03.lqppte Reqs: 0 /s 16.7k 13.4k
8233 990
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 5888M 18.5T
cephfs.cephfs.data data 119G 215T
cephfs.cephfs.data.e_4_2 data 2289G 3241T
cephfs.cephfs.data.e_8_3 data 9997G 470T
STANDBY MDS
cephfs.naret-monitor03.eflouf
cephfs.naret-monitor01.uvevbf
MDS version: ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d)
reef (stable)
“””
The file system is totally unresponsive (we can mount it on client nodes but
any operations like a simple ls hangs).
During the night we had a lot of mds crashes, I can share the content.
Does anybody have an idea on how to tackle this problem?
Best,
Giuseppe
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]