Hi,

Currently i'm running Ceph Luminous 12.2.5.

This morning I tried running Multi MDS with:
ceph fs set <fs_name> max_mds 2

I have 5 MDS servers. After running above command,
I had 2 active MDSs, 2 standby-active and 1 standby.

And after trying a failover on one of the active MDSs, a standby-active did a replay but crashed (laggy or crashed). Memory and CPU went sky high on the MDS and was unresponsive after some time. I ended up with the one active MDS but got stuck with a degraded filesystem and warning messages about MDS behind on trimming.

I never got any additional MDS active since then. I tried restarting the last active MDS (because the filesystem was becoming unresponsive and had a load of slow requets) and it never got passed replay -> resolve. My MDS cluster still isn't active... :(

What is the "resolve" state? I have never seen that before pre-Luminous.
Debug on 20 doesn't give me much.

Also tried removing the Multi MDS setup, but my CephFS cluster won't go active. How can I get my CephFS up and running again in an active state.

Please help.


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to