On Tue, 10 Apr 2018, Patrick Donnelly wrote:
> On Tue, Apr 10, 2018 at 5:54 AM, John Spray <jsp...@redhat.com> wrote:
> > On Tue, Apr 10, 2018 at 1:44 PM, Yan, Zheng <uker...@gmail.com> wrote:
> >> Hello
> >>
> >> To simplify snapshot handling in multiple active mds setup, we changed
> >> format of snaprealm in mimic dev.
> >> https://github.com/ceph/ceph/pull/16779.
> >>
> >> The new version mds can handle old format snaprealm in single active
> >> setup. It also can convert old format snaprealm to the new format when
> >> snaprealm is modified. The problem is that new version mds can not
> >> properly handle old format snaprealm in multiple active setup. It may
> >> crash when it encounter old format snaprealm. For existing filesystem
> >> with snapshots, upgrading mds to mimic seems to be no problem at first
> >> glance. But if user later enables multiple active mds,  mds may
> >> crashes continuously. No easy way to switch back to single acitve mds.
> >>
> >> I don't have clear idea how to handle this situation. I can think of a
> >> few options.
> >>
> >> 1. Forbid multiple active before all old snapshots are deleted or
> >> before all snaprealms are converted to new format. Format conversion
> >> requires traversing while whole filesystem tree.  Not easy to
> >> implement.
> >
> > This has been a general problem with metadata format changes: we can
> > never know if all the metadata in a filesystem has been brought up to
> > a particular version.  Scrubbing (where scrub does the updates) should
> > be the answer, but we don't have the mechanism for recording/ensuring
> > the scrub has really happened.
> >
> > Maybe we need the MDS to be able to report a complete whole-filesystem
> > scrub to the monitor, and record a field like
> > "latest_scrubbed_version" in FSMap, so that we can be sure that all
> > the filesystem metadata has been brought up to a certain version
> > before enabling certain features?  So we'd then have a
> > "latest_scrubbed_version >= mimic" test before enabling multiple
> > active daemons.
> >
> > For this particular situation, we'd also need to protect against
> > people who had enabled multimds and snapshots experimentally, with an
> > MDS startup check like:
> >  ((ever_allowed_features & CEPH_MDSMAP_ALLOW_SNAPS) &&
> > (allows_multimds() || in.size() >1)) && latest_scrubbed_version <
> > mimic
> 
> This sounds like the right approach to me. The mons should also be
> capable of performing the same test and raising a health error that
> pre-Mimic MDSs must be started and the number of actives be reduced to
> 1.

Does scrub actually do the conversion already, though, or does that need 
to be implemented?

sage
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to