[ceph-users] MDS stuck in rejoin

Frank Schilder Thu, 20 Jul 2023 07:11:25 -0700

Hi all,

we had a client with the warning "[WRN] MDS_CLIENT_OLDEST_TID: 1 clients 
failing to advance oldest client/flush tid". I looked at the client and there 
was nothing going on, so I rebooted it. After the client was back, the message 
was still there. To clean this up I failed the MDS. Unfortunately, the MDS that 
took over is remained stuck in rejoin without doing anything. All that happened 
in the log was:


[root@ceph-10 ceph]# tail -f ceph-mds.ceph-10.log
2023-07-20T15:54:29.147+0200 7fedb9c9f700  1 mds.2.896604 rejoin_start
2023-07-20T15:54:29.161+0200 7fedb9c9f700  1 mds.2.896604 rejoin_joint_start
2023-07-20T15:55:28.005+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896614 from mon.4
2023-07-20T15:56:00.278+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896615 from mon.4
[...]
2023-07-20T16:02:54.935+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896653 from mon.4
2023-07-20T16:03:07.276+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896654 from mon.4

After some time I decided to give another fail a try and, this time, the 
replacement daemon went to active state really fast.

If I have a message like the above, what is the clean way of getting the client 
clean again (version: 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
octopus (stable))?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] MDS stuck in rejoin

Reply via email to