Re: [ceph-users] ceph-mds failure replaying journal

Jon Morby (Fido) Mon, 29 Oct 2018 04:14:23 -0700

I've experimented and whilst the downgrade looks to be working, you end up with 
errors regarding unsupported feature "mimic" amongst others


2018-10-29 10:51:20.652047 7f6f1b9f5080 -1 ERROR: on disk data includes 
unsupported features: compat={},rocompat={},incompat={10=mimic ondisk layou 

so I gave up on that idea 

In addition to the cephfs volume (which is basically just mirrors and some 
backups) we have a large rbd deployment using the same ceph cluster, and if we 
lose that we're screwed ... the cephfs volume was more an "experiment" to see 
how viable it would be as an NFS replacement 

There's 26TB of data on there, so I'd rather not have to go off and redownload 
it all .. but losing it isn't the end of the world (but it will piss off a few 
friends) 

Jon 

----- On 29 Oct, 2018, at 09:54, Zheng Yan <[email protected]> wrote: 

> On Mon, Oct 29, 2018 at 5:25 PM Jon Morby (Fido) < [ mailto:[email protected] |
> [email protected] ] > wrote:

>> Hi

>> Ideally we'd like to undo the whole accidental upgrade to 13.x and ensure 
>> that
>> ceph-deploy doesn't do another major release upgrade without a lot of 
>> warnings

>> Either way, I'm currently getting errors that 13.2.1 isn't available / 
>> shaman is
>> offline / etc

>> What's the best / recommended way of doing this downgrade across our estate?

> You have already upgraded ceph-mon. I don't know If it can be safely 
> downgraded
> (If I remember right, I corrupted monitor's data when downgrading ceph-mon 
> from
> minic to luminous).

>> ----- On 29 Oct, 2018, at 08:19, Yan, Zheng < [ mailto:[email protected] |
>> [email protected] ] > wrote:

>>> We backported a wrong patch to 13.2.2. downgrade ceph to 13.2.1, then run 
>>> 'ceph
>>> mds repaired fido_fs:1" .
>>> Sorry for the trouble
>>> Yan, Zheng

>>> On Mon, Oct 29, 2018 at 7:48 AM Jon Morby < [ mailto:[email protected] | 
>>> [email protected]
>>> ] > wrote:

>>>> We accidentally found ourselves upgraded from 12.2.8 to 13.2.2 after a
>>>> ceph-deploy install went awry (we were expecting it to upgrade to 12.2.9 
>>>> and
>>>> not jump a major release without warning)

>>>> Anyway .. as a result, we ended up with an mds journal error and 1 daemon
>>>> reporting as damaged

>>>> Having got nowhere trying to ask for help on irc, we've followed various 
>>>> forum
>>>> posts and disaster recovery guides, we ended up resetting the journal which
>>>> left the daemon as no longer “damaged” however we’re now seeing mds 
>>>> segfault
>>>> whilst trying to replay

>>>> [ https://pastebin.com/iSLdvu0b | https://pastebin.com/iSLdvu0b ]

>>>> /build/ceph-13.2.2/src/mds/ [ http://journal.cc/ | journal.cc ] : 1572: 
>>>> FAILED
>>>> assert(g_conf->mds_wipe_sessions)

>>>> ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
>>>> (stable)
>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>>>> const*)+0x102)
>>>> [0x7fad637f70f2]
>>>> 2: (()+0x3162b7) [0x7fad637f72b7]
>>>> 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) 
>>>> [0x7a7a6b]
>>>> 4: (EUpdate::replay(MDSRank*)+0x39) [0x7a8fa9]
>>>> 5: (MDLog::_replay_thread()+0x864) [0x752164]
>>>> 6: (MDLog::ReplayThread::entry()+0xd) [0x4f021d]
>>>> 7: (()+0x76ba) [0x7fad6305a6ba]
>>>> 8: (clone()+0x6d) [0x7fad6288341d]
>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
>>>> interpret this.

>>>> full logs

>>>> [ https://pastebin.com/X5UG9vT2 | https://pastebin.com/X5UG9vT2 ]

>>>> We’ve been unable to access the cephfs file system since all of this 
>>>> started ….
>>>> attempts to mount fail with reports that “mds probably not available”

>>>> Oct 28 23:47:02 mirrors kernel: [115602.911193] ceph: probably no mds 
>>>> server is
>>>> up

>>>> root@mds02:~# ceph -s
>>>> cluster:
>>>> id: 78d5bf7d-b074-47ab-8d73-bd4d99df98a5
>>>> health: HEALTH_WARN
>>>> 1 filesystem is degraded
>>>> insufficient standby MDS daemons available
>>>> too many PGs per OSD (276 > max 250)

>>>> services:
>>>> mon: 3 daemons, quorum mon01,mon02,mon03
>>>> mgr: mon01(active), standbys: mon02, mon03
>>>> mds: fido_fs-2/2/1 up {0=mds01=up:resolve,1=mds02=up:replay(laggy or 
>>>> crashed)}
>>>> osd: 27 osds: 27 up, 27 in

>>>> data:
>>>> pools: 15 pools, 3168 pgs
>>>> objects: 16.97 M objects, 30 TiB
>>>> usage: 71 TiB used, 27 TiB / 98 TiB avail
>>>> pgs: 3168 active+clean

>>>> io:
>>>> client: 680 B/s rd, 1.1 MiB/s wr, 0 op/s rd, 345 op/s wr

>>>> Before I just trash the entire fs and give up on ceph, does anyone have any
>>>> suggestions as to how we can fix this?

>>>> root@mds02:~# ceph versions
>>>> {
>>>> "mon": {
>>>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
>>>> (stable)":
>>>> 3
>>>> },
>>>> "mgr": {
>>>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
>>>> (stable)":
>>>> 3
>>>> },
>>>> "osd": {
>>>> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous
>>>> (stable)": 27
>>>> },
>>>> "mds": {
>>>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
>>>> (stable)":
>>>> 2
>>>> },
>>>> "overall": {
>>>> "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous
>>>> (stable)": 27,
>>>> "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic 
>>>> (stable)":
>>>> 8
>>>> }
>>>> }

>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [ mailto:[email protected] | [email protected] ]
>>>> [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]

>> --

>> Jon Morby
>> FidoNet - the internet made simple!
>> 10 - 16 Tiller Road, London, E14 8PX
>> tel: 0345 004 3050 / fax: 0345 004 3051

>> Need more rack space?
>> Check out our Co-Lo offerings at [ http://www.fido.net/services/colo/%20 |
>> http://www.fido.net/services/colo/  ] 32 amp racks in London and Brighton
>> Linx ConneXions available at all Fido sites! [
>> https://www.fido.net/services/backbone/connexions/ |
>> https://www.fido.net/services/backbone/connexions/ ]
>> [ http://jonmorby.com/B3B5AD3A.asc | PGP Key ] : 26DC B618 DE9E F9CB F8B7 
>> 1EFA
>> 2A64 BA69 B3B5 AD3A - [ http://jonmorby.com/B3B5AD3A.asc |
>> http://jonmorby.com/B3B5AD3A.asc ]

-- 

Jon Morby 
FidoNet - the internet made simple! 
10 - 16 Tiller Road, London, E14 8PX 
tel: 0345 004 3050 / fax: 0345 004 3051 

Need more rack space? 
Check out our Co-Lo offerings at [ http://www.fido.net/services/colo/%20 | 
http://www.fido.net/services/colo/  ] 32 amp racks in London and Brighton 
Linx ConneXions available at all Fido sites! [ 
https://www.fido.net/services/backbone/connexions/ | 
https://www.fido.net/services/backbone/connexions/ ] 
[ http://jonmorby.com/B3B5AD3A.asc | PGP Key ] : 26DC B618 DE9E F9CB F8B7 1EFA 
2A64 BA69 B3B5 AD3A - http://jonmorby.com/B3B5AD3A.asc

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-mds failure replaying journal

Reply via email to