On Wed, Jun 27, 2018 at 6:16 PM Dennis Kramer (DT) <den...@holmes.nl> wrote:
>
> Hi,
>
> Currently i'm running Ceph Luminous 12.2.5.
>
> This morning I tried running Multi MDS with:
> ceph fs set <fs_name> max_mds 2
>
> I have 5 MDS servers. After running above command,
> I had 2 active MDSs, 2 standby-active and 1 standby.
>
> And after trying a failover on one
> of the active MDSs, a standby-active did a replay but crashed (laggy or
> crashed). Memory and CPU went sky high on the MDS and was unresponsive
> after some time. I ended up with the one active MDS but got stuck with a
> degraded filesystem and warning messages about MDS behind on trimming.
>
> I never got any additional MDS active since then. I tried restarting the
> last active MDS (because the filesystem was becoming unresponsive and had
> a load of slow requets) and it never got passed replay -> resolve. My MDS
> cluster still isn't active... :(

What is the 'ceph -w' ouput? If you have enabled multi-active mds. All
mds ranks need to enter the resolve 'state' before they can continue
to recover.



>
> What is the "resolve" state? I have never seen that before pre-Luminous.
> Debug on 20 doesn't give me much.
>
> Also tried removing the Multi MDS setup, but my CephFS cluster won't go
> active. How can I get my CephFS up and running again in an active state.
>
> Please help.
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to