[ceph-users] Multi-MDS Failover

Scottix Thu, 26 Apr 2018 15:17:23 -0700

Updated to 12.2.5

We are starting to test multi_mds cephfs and we are going through some
failure scenarios in our test cluster.


We are simulating a power failure to one machine and we are getting mixed
results of what happens to the file system.

This is the status of the mds once we simulate the power loss considering
there are no more standbys.

mds: cephfs-2/2/2 up
{0=CephDeploy100=up:active,1=TigoMDS100=up:active(laggy or crashed)}

1. It is a little unclear if it is laggy or really is down, using this line
alone.
2. The first time we lost total access to ceph folder and just blocked i/o
3. One time we were still able to access ceph folder and everything seems
to be running.
4. One time we had a script creating a bunch of files, simulated the crash,
then we list the directory and showed 0 files, expected should be lots of
files.

I mean we could go into details of each of those, but really I am trying to
understand ceph logic in dealing with a crashed multi mds or if you mark it
degraded? or what is going on.

It just seems a little unclear what is going to happen.

Good news once it comes back online everything is as it should be.

Thanks

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Multi-MDS Failover

Reply via email to