Re: [ceph-users] MDS damaged

Alessandro De Salvo Thu, 12 Jul 2018 08:39:29 -0700

Some progress, and more pain...

I was able to recover the 200.00000000 using the ceph-objectstore-toolfor one of the OSDs (all identical copies) but trying to re-inject itjust with rados put was giving no error while the get was still givingthe same I/O error. So the solution was to rm the object and the put itagain, that worked.

However, after restarting one of the MDSes and seeting it to repaired,I've hit another, similar problem:

2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log[ERR] : error reading table object 'mds0_inotable' -5 ((5) Input/outputerror)

Can I safely try to do the same as for object 200.00000000? Should Icheck something before trying it? Again, checking the copies of theobject, they have identical md5sums on all the replicas.


Thanks,


    Alessandro


Il 12/07/18 16:46, Alessandro De Salvo ha scritto:


Unfortunately yes, all the OSDs were restarted a few times, but no change.

Thanks,


    Alessandro


Il 12/07/18 15:55, Paul Emmerich ha scritto:

This might seem like a stupid suggestion, but: have you tried torestart the OSDs?

I've also encountered some random CRC errors that only showed up whentrying to read an object,but not on scrubbing, that magically disappeared after restarting theOSD.

However, in my case it was clearly related tohttps://tracker.ceph.com/issues/22464 which doesn't

seem to be the issue here.

Paul

2018-07-12 13:53 GMT+02:00 Alessandro De Salvo<alessandro.desa...@roma1.infn.it<mailto:alessandro.desa...@roma1.infn.it>>:



    Il 12/07/18 11:20, Alessandro De Salvo ha scritto:



        Il 12/07/18 10:58, Dan van der Ster ha scritto:

            On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum
            <gfar...@redhat.com <mailto:gfar...@redhat.com>> wrote:

                On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo
                <alessandro.desa...@roma1.infn.it
                <mailto:alessandro.desa...@roma1.infn.it>> wrote:

                    OK, I found where the object is:


                    ceph osd map cephfs_metadata 200.00000000
                    osdmap e632418 pool 'cephfs_metadata' (10) object
                    '200.00000000' -> pg
                    10.844f3494 (10.14) -> up ([23,35,18], p23)
                    acting ([23,35,18], p23)


                    So, looking at the osds 23, 35 and 18 logs in
                    fact I see:


                    osd.23:

                    2018-07-11 15:49:14.913771 7efbee672700 -1
                    log_channel(cluster) log
                    [ERR] : 10.14 full-object read crc 0x976aefc5 !=
                    expected 0x9ef2b41b on
                    10:292cf221:::200.00000000:head


                    osd.35:

                    2018-07-11 18:01:19.989345 7f760291a700 -1
                    log_channel(cluster) log
                    [ERR] : 10.14 full-object read crc 0x976aefc5 !=
                    expected 0x9ef2b41b on
                    10:292cf221:::200.00000000:head


                    osd.18:

                    2018-07-11 18:18:06.214933 7fabaf5c1700 -1
                    log_channel(cluster) log
                    [ERR] : 10.14 full-object read crc 0x976aefc5 !=
                    expected 0x9ef2b41b on
                    10:292cf221:::200.00000000:head


                    So, basically the same error everywhere.

                    I'm trying to issue a repair of the pg 10.14, but
                    I'm not sure if it may
                    help.

                    No SMART errors (the fileservers are SANs, in
                    RAID6 + LVM volumes), and
                    no disk problems anywhere. No relevant errors in
                    syslogs, the hosts are
                    just fine. I cannot exclude an error on the RAID
                    controllers, but 2 of
                    the OSDs with 10.14 are on a SAN system and one
                    on a different one, so I
                    would tend to exclude they both had (silent)
                    errors at the same time.


                That's fairly distressing. At this point I'd probably
                try extracting the object using ceph-objectstore-tool
                and seeing if it decodes properly as an mds journal.
                If it does, you might risk just putting it back in
                place to overwrite the crc.

            Wouldn't it be easier to scrub repair the PG to fix the crc?


        this is what I already instructed the cluster to do, a deep
        scrub, but I'm not sure it could repair in case all replicas
        are bad, as it seems to be the case.


    I finally managed (with the help of Dan), to perform the
    deep-scrub on pg 10.14, but the deep scrub did not detect
    anything wrong. Also trying to repair 10.14 has no effect.
    Still, trying to access the object I get in the OSDs:

    2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster)
    log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected
    0x9ef2b41b on 10:292cf221:::200.00000000:head

    Was deep-scrub supposed to detect the wrong crc? If yes, them it
    sounds like a bug.
    Can I force the repair someway?
    Thanks,

       Alessandro



            Alessandro, did you already try a deep-scrub on pg 10.14?


        I'm waiting for the cluster to do that, I've sent it earlier
        this morning.

              I expect
            it'll show an inconsistent object. Though, I'm unsure if
            repair will
            correct the crc given that in this case *all* replicas
            have a bad crc.


        Exactly, this is what I wonder too.
        Cheers,

            Alessandro


            --Dan

                However, I'm also quite curious how it ended up that
                way, with a checksum mismatch but identical data (and
                identical checksums!) across the three replicas. Have
                you previously done some kind of scrub repair on the
                metadata pool? Did the PG perhaps get backfilled due
                to cluster changes?
                -Greg


                    Thanks,


                          Alessandro



                    Il 11/07/18 18:56, John Spray ha scritto:

                        On Wed, Jul 11, 2018 at 4:49 PM Alessandro De
                        Salvo
                        <alessandro.desa...@roma1.infn.it
                        <mailto:alessandro.desa...@roma1.infn.it>> wrote:

                            Hi John,

                            in fact I get an I/O error by hand too:


                            rados get -p cephfs_metadata 200.00000000
                            200.00000000
                            error getting
                            cephfs_metadata/200.00000000: (5)
                            Input/output error

                        Next step would be to go look for
                        corresponding errors on your OSD
                        logs, system logs, and possibly also check
                        things like the SMART
                        counters on your hard drives for possible
                        root causes.

                        John



                            Can this be recovered someway?

                            Thanks,


                                   Alessandro


                            Il 11/07/18 18:33, John Spray ha scritto:

                                On Wed, Jul 11, 2018 at 4:10 PM
                                Alessandro De Salvo
                                <alessandro.desa...@roma1.infn.it
                                <mailto:alessandro.desa...@roma1.infn.it>>
                                wrote:

                                    Hi,

                                    after the upgrade to luminous
                                    12.2.6 today, all our MDSes have been
                                    marked as damaged. Trying to
                                    restart the instances only result in
                                    standby MDSes. We currently have
                                    2 filesystems active and 2 MDSes
                                    each.

                                    I found the following error
                                    messages in the mon:


                                    mds.0 <node1_IP>:6800/2412911269
                                    down:damaged
                                    mds.1 <node2_IP>:6800/830539001
                                    down:damaged
                                    mds.0 <node3_IP>:6800/4080298733
                                    down:damaged


                                    Whenever I try to force the
                                    repaired state with ceph mds repaired
                                    <fs_name>:<rank> I get something
                                    like this in the MDS logs:


                                    2018-07-11 13:20:41.597970
                                    7ff7e010e700  0
                                    mds.1.journaler.mdlog(ro)
                                    error getting journal off disk
                                    2018-07-11 13:20:41.598173
                                    7ff7df90d700 -1
                                    log_channel(cluster) log
                                    [ERR] : Error recovering journal
                                    0x201: (5) Input/output error

                                An EIO reading the journal header is
                                pretty scary. The MDS itself
                                probably can't tell you much more
                                about this: you need to dig down
                                into the RADOS layer.  Try reading
                                the 200.00000000 object (that
                                happens to be the rank 0 journal
                                header, every CephFS filesystem
                                should have one) using the `rados`
                                command line tool.

                                John



                                    Any attempt of running the
                                    journal export results in errors,
                                    like this one:


                                    cephfs-journal-tool
                                    --rank=cephfs:0 journal export
                                    backup.bin
                                    Error ((5) Input/output
                                    error)2018-07-11 17:01:30.631571
                                    7f94354fff00 -1
                                    Header 200.00000000 is unreadable

                                    2018-07-11 17:01:30.631584
                                    7f94354fff00 -1 journal_export:
                                    Journal not
                                    readable, attempt
                                    object-by-object dump with `rados`


                                    Same happens for recover_dentries

                                    cephfs-journal-tool
                                    --rank=cephfs:0 event
                                    recover_dentries summary
                                    Events by type:2018-07-11
                                    17:04:19.770779 7f05429fef00 -1
                                    Header
                                    200.00000000 is unreadable
                                    Errors:
                                    0

                                    Is there something I could try to
                                    do to have the cluster back?

                                    I was able to dump the contents
                                    of the metadata pool with rados
                                    export
                                    -p cephfs_metadata <filename> and
                                    I'm currently trying the procedure
                                    described in
                                    
http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery
                                    
<http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery>

                                    but I'm not sure if it will work
                                    as it's apparently doing nothing
                                    at the
                                    moment (maybe it's just very slow).

                                    Any help is appreciated, thanks!


                                            Alessandro

                                    
_______________________________________________
                                    ceph-users mailing list
                                    ceph-users@lists.ceph.com
                                    <mailto:ceph-users@lists.ceph.com>
                                    
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
                                    
<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

                    _______________________________________________
                    ceph-users mailing list
                    ceph-users@lists.ceph.com
                    <mailto:ceph-users@lists.ceph.com>
                    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
                    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

                _______________________________________________
                ceph-users mailing list
                ceph-users@lists.ceph.com
                <mailto:ceph-users@lists.ceph.com>
                http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
                <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>


        _______________________________________________
        ceph-users mailing list
        ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>


    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>




--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io <http://www.croit.io>
Tel: +49 89 1896585 90




_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS damaged

Reply via email to