Re: [ceph-users] * SPAM * Re: Multi-MDS Failover

Scottix Fri, 27 Apr 2018 06:41:23 -0700

Hey Dan,

Thanks you for the response, the namespace methodology makes more sense and
I think that explains what would be up or not.


In regards to my original email with number 4 of listing 0 files. I will
try to recreate with debug on and submit an issue if that turns out to be a
bug.

I am sorry if I have offended anyone with my attitude, I am just trying to
get information and understand what is going on. I want Ceph and CephFS to
be the best out there.

Thank you all

On Fri, Apr 27, 2018 at 12:14 AM Dan van der Ster <d...@vanderster.com>
wrote:

> Hi Scott,
>
> Multi MDS just assigns different parts of the namespace to different
> "ranks". Each rank (0, 1, 2, ...) is handled by one of the active
> MDSs. (You can query which parts of the name space are assigned to
> each rank using the jq tricks in [1]). If a rank is down and there are
> no more standby's, then you need to bring up a new MDS to handle that
> down rank. In the meantime, part of the namespace will have IO
> blocked.
>
> To handle these failures, you need to configure sufficient standby
> MDSs to handle the failure scenarios you foresee in your environment.
> A strictly "standby" MDS can takeover from *any* of the failed ranks,
> and you can have several "standby" MDSs to cover multiple failures. So
> just run 2 or 3 standby's if you want to be on the safe side.
>
> You can also configure "standby-for-rank" MDSs -- that is, a given
> standby MDS can be watching a specific rank then taking over it that
> specific MDS fails. Those standby-for-rank MDS's can even be "hot"
> standby's to speed up the failover process.
>
> An active MDS for a given rank does not act as a standby for the other
> ranks. I'm not sure if it *could* following some code changes, but
> anyway that just not how it works today.
>
> Does that clarify things?
>
> Cheers, Dan
>
> [1] https://ceph.com/community/new-luminous-cephfs-subtree-pinning/
>
>
> On Fri, Apr 27, 2018 at 4:04 AM, Scottix <scot...@gmail.com> wrote:
> > Ok let me try to explain this better, we are doing this back and forth
> and
> > its not going anywhere. I'll just be as genuine as I can and explain the
> > issue.
> >
> > What we are testing is a critical failure scenario and actually more of a
> > real world scenario. Basically just what happens when it is 1AM and the
> shit
> > hits the fan, half of your servers are down and 1 of the 3 MDS boxes are
> > still alive.
> > There is one very important fact that happens with CephFS and when the
> > single Active MDS server fails. It is guaranteed 100% all IO is blocked.
> No
> > split-brain, no corrupted data, 100% guaranteed ever since we started
> using
> > CephFS
> >
> > Now with multi_mds, I understand this changes the logic and I understand
> how
> > difficult and how hard this problem is, trust me I would not be able to
> > tackle this. Basically I need to answer the question; what happens when
> 1 of
> > 2 multi_mds fails with no standbys ready to come save them?
> > What I have tested is not the same of a single active MDS; this
> absolutely
> > changes the logic of what happens and how we troubleshoot. The CephFS is
> > still alive and it does allow operations and does allow resources to go
> > through. How, why and what is affected are very relevant questions if
> this
> > is what the failure looks like since it is not 100% blocking.
> >
> > This is the problem, I have programs writing a massive amount of data
> and I
> > don't want it corrupted or lost. I need to know what happens and I need
> to
> > have guarantees.
> >
> > Best
> >
> >
> > On Thu, Apr 26, 2018 at 5:03 PM Patrick Donnelly <pdonn...@redhat.com>
> > wrote:
> >>
> >> On Thu, Apr 26, 2018 at 4:40 PM, Scottix <scot...@gmail.com> wrote:
> >> >> Of course -- the mons can't tell the difference!
> >> > That is really unfortunate, it would be nice to know if the filesystem
> >> > has
> >> > been degraded and to what degree.
> >>
> >> If a rank is laggy/crashed, the file system as a whole is generally
> >> unavailable. The span between partial outage and full is small and not
> >> worth quantifying.
> >>
> >> >> You must have standbys for high availability. This is the docs.
> >> > Ok but what if you have your standby go down and a master go down.
> This
> >> > could happen in the real world and is a valid error scenario.
> >> >Also there is
> >> > a period between when the standby becomes active what happens
> in-between
> >> > that time?
> >>
> >> The standby MDS goes through a series of states where it recovers the
> >> lost state and connections with clients. Finally, it goes active.
> >>
> >> >> It depends(tm) on how the metadata is distributed and what locks are
> >> > held by each MDS.
> >> > Your saying depending on which mds had a lock on a resource it will
> >> > block
> >> > that particular POSIX operation? Can you clarify a little bit?
> >> >
> >> >> Standbys are not optional in any production cluster.
> >> > Of course in production I would hope people have standbys but in
> theory
> >> > there is no enforcement in Ceph for this other than a warning. So when
> >> > you
> >> > say not optional that is not exactly true it will still run.
> >>
> >> It's self-defeating to expect CephFS to enforce having standbys --
> >> presumably by throwing an error or becoming unavailable -- when the
> >> standbys exist to make the system available.
> >>
> >> There's nothing to enforce. A warning is sufficient for the operator
> >> that (a) they didn't configure any standbys or (b) MDS daemon
> >> processes/boxes are going away and not coming back as standbys (i.e.
> >> the pool of MDS daemons is decreasing with each failover)
> >>
> >> --
> >> Patrick Donnelly
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] *** SPAM *** Re: Multi-MDS Failover

Reply via email to

Re: [ceph-users] * SPAM * Re: Multi-MDS Failover