Hey Dan, Thanks you for the response, the namespace methodology makes more sense and I think that explains what would be up or not.
In regards to my original email with number 4 of listing 0 files. I will try to recreate with debug on and submit an issue if that turns out to be a bug. I am sorry if I have offended anyone with my attitude, I am just trying to get information and understand what is going on. I want Ceph and CephFS to be the best out there. Thank you all On Fri, Apr 27, 2018 at 12:14 AM Dan van der Ster <d...@vanderster.com> wrote: > Hi Scott, > > Multi MDS just assigns different parts of the namespace to different > "ranks". Each rank (0, 1, 2, ...) is handled by one of the active > MDSs. (You can query which parts of the name space are assigned to > each rank using the jq tricks in [1]). If a rank is down and there are > no more standby's, then you need to bring up a new MDS to handle that > down rank. In the meantime, part of the namespace will have IO > blocked. > > To handle these failures, you need to configure sufficient standby > MDSs to handle the failure scenarios you foresee in your environment. > A strictly "standby" MDS can takeover from *any* of the failed ranks, > and you can have several "standby" MDSs to cover multiple failures. So > just run 2 or 3 standby's if you want to be on the safe side. > > You can also configure "standby-for-rank" MDSs -- that is, a given > standby MDS can be watching a specific rank then taking over it that > specific MDS fails. Those standby-for-rank MDS's can even be "hot" > standby's to speed up the failover process. > > An active MDS for a given rank does not act as a standby for the other > ranks. I'm not sure if it *could* following some code changes, but > anyway that just not how it works today. > > Does that clarify things? > > Cheers, Dan > > [1] https://ceph.com/community/new-luminous-cephfs-subtree-pinning/ > > > On Fri, Apr 27, 2018 at 4:04 AM, Scottix <scot...@gmail.com> wrote: > > Ok let me try to explain this better, we are doing this back and forth > and > > its not going anywhere. I'll just be as genuine as I can and explain the > > issue. > > > > What we are testing is a critical failure scenario and actually more of a > > real world scenario. Basically just what happens when it is 1AM and the > shit > > hits the fan, half of your servers are down and 1 of the 3 MDS boxes are > > still alive. > > There is one very important fact that happens with CephFS and when the > > single Active MDS server fails. It is guaranteed 100% all IO is blocked. > No > > split-brain, no corrupted data, 100% guaranteed ever since we started > using > > CephFS > > > > Now with multi_mds, I understand this changes the logic and I understand > how > > difficult and how hard this problem is, trust me I would not be able to > > tackle this. Basically I need to answer the question; what happens when > 1 of > > 2 multi_mds fails with no standbys ready to come save them? > > What I have tested is not the same of a single active MDS; this > absolutely > > changes the logic of what happens and how we troubleshoot. The CephFS is > > still alive and it does allow operations and does allow resources to go > > through. How, why and what is affected are very relevant questions if > this > > is what the failure looks like since it is not 100% blocking. > > > > This is the problem, I have programs writing a massive amount of data > and I > > don't want it corrupted or lost. I need to know what happens and I need > to > > have guarantees. > > > > Best > > > > > > On Thu, Apr 26, 2018 at 5:03 PM Patrick Donnelly <pdonn...@redhat.com> > > wrote: > >> > >> On Thu, Apr 26, 2018 at 4:40 PM, Scottix <scot...@gmail.com> wrote: > >> >> Of course -- the mons can't tell the difference! > >> > That is really unfortunate, it would be nice to know if the filesystem > >> > has > >> > been degraded and to what degree. > >> > >> If a rank is laggy/crashed, the file system as a whole is generally > >> unavailable. The span between partial outage and full is small and not > >> worth quantifying. > >> > >> >> You must have standbys for high availability. This is the docs. > >> > Ok but what if you have your standby go down and a master go down. > This > >> > could happen in the real world and is a valid error scenario. > >> >Also there is > >> > a period between when the standby becomes active what happens > in-between > >> > that time? > >> > >> The standby MDS goes through a series of states where it recovers the > >> lost state and connections with clients. Finally, it goes active. > >> > >> >> It depends(tm) on how the metadata is distributed and what locks are > >> > held by each MDS. > >> > Your saying depending on which mds had a lock on a resource it will > >> > block > >> > that particular POSIX operation? Can you clarify a little bit? > >> > > >> >> Standbys are not optional in any production cluster. > >> > Of course in production I would hope people have standbys but in > theory > >> > there is no enforcement in Ceph for this other than a warning. So when > >> > you > >> > say not optional that is not exactly true it will still run. > >> > >> It's self-defeating to expect CephFS to enforce having standbys -- > >> presumably by throwing an error or becoming unavailable -- when the > >> standbys exist to make the system available. > >> > >> There's nothing to enforce. A warning is sufficient for the operator > >> that (a) they didn't configure any standbys or (b) MDS daemon > >> processes/boxes are going away and not coming back as standbys (i.e. > >> the pool of MDS daemons is decreasing with each failover) > >> > >> -- > >> Patrick Donnelly > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com