Hi all,
We are seeing this several times. Some of our MDS stop reporting stats for no
obvious reason. And a rolling restart of all MDS in question could resolve
this. But restarting active MDS could cause downtime up to several minutes, we
don’t want to do this constantly.
Client count, MDS version info are also missing from “ceph fs status” and web
dashboard. Prometheus metrics are also affected. But “ceph tell
mds.cephfs.gpu018.ovxvoz session ls” reports correct client sessions.
Also, the new "cephfs-top" does not work for us, It only shows a lot of N/A. I
don't know if it is related.
Apart from these, the actual metadata operations seem to work fine.
How can I identify the root cause? Is this a known bug?
Thanks,
Weiwen Hu
$ ceph fs status
cephfs - 0 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS
CAPS
0 active cephfs.gpu018.ovxvoz Reqs: 0 /s 0 0 0
0
1 active cephfs.gpu006.ddpekw Reqs: 0 /s 0 0 0
0
1-s standby-replay cephfs.gpu023.aetiph Evts: 0 /s 0 0 0
0
0-s standby-replay cephfs.gpu024.rpfbnh Evts: 69 /s 2242k 2242k 11.5k
0
POOL TYPE USED AVAIL
cephfs.cephfs.meta metadata 127G 523G
cephfs.cephfs.data data 74.6T 15.8T
cephfs.cephfs.data_ssd data 0 785G
cephfs.cephfs.data_mixed data 8768G 523G
VERSION
DAEMONS
None
cephfs.gpu018.ovxvoz, cephfs.gpu006.ddpekw, cephfs.gpu023.aetiph
ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)
cephfs.gpu024.rpfbnh
Note a lot of “0”, and 3 of the MDS are missing version info
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]