On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum <gfar...@redhat.com> wrote: > > On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo > <alessandro.desa...@roma1.infn.it> wrote: >> >> OK, I found where the object is: >> >> >> ceph osd map cephfs_metadata 200.00000000 >> osdmap e632418 pool 'cephfs_metadata' (10) object '200.00000000' -> pg >> 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23) >> >> >> So, looking at the osds 23, 35 and 18 logs in fact I see: >> >> >> osd.23: >> >> 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on >> 10:292cf221:::200.00000000:head >> >> >> osd.35: >> >> 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on >> 10:292cf221:::200.00000000:head >> >> >> osd.18: >> >> 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on >> 10:292cf221:::200.00000000:head >> >> >> So, basically the same error everywhere. >> >> I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may >> help. >> >> No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and >> no disk problems anywhere. No relevant errors in syslogs, the hosts are >> just fine. I cannot exclude an error on the RAID controllers, but 2 of >> the OSDs with 10.14 are on a SAN system and one on a different one, so I >> would tend to exclude they both had (silent) errors at the same time. > > > That's fairly distressing. At this point I'd probably try extracting the > object using ceph-objectstore-tool and seeing if it decodes properly as an > mds journal. If it does, you might risk just putting it back in place to > overwrite the crc. >
Wouldn't it be easier to scrub repair the PG to fix the crc? Alessandro, did you already try a deep-scrub on pg 10.14? I expect it'll show an inconsistent object. Though, I'm unsure if repair will correct the crc given that in this case *all* replicas have a bad crc. --Dan > However, I'm also quite curious how it ended up that way, with a checksum > mismatch but identical data (and identical checksums!) across the three > replicas. Have you previously done some kind of scrub repair on the metadata > pool? Did the PG perhaps get backfilled due to cluster changes? > -Greg > >> >> >> Thanks, >> >> >> Alessandro >> >> >> >> Il 11/07/18 18:56, John Spray ha scritto: >> > On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo >> > <alessandro.desa...@roma1.infn.it> wrote: >> >> Hi John, >> >> >> >> in fact I get an I/O error by hand too: >> >> >> >> >> >> rados get -p cephfs_metadata 200.00000000 200.00000000 >> >> error getting cephfs_metadata/200.00000000: (5) Input/output error >> > Next step would be to go look for corresponding errors on your OSD >> > logs, system logs, and possibly also check things like the SMART >> > counters on your hard drives for possible root causes. >> > >> > John >> > >> > >> > >> >> >> >> Can this be recovered someway? >> >> >> >> Thanks, >> >> >> >> >> >> Alessandro >> >> >> >> >> >> Il 11/07/18 18:33, John Spray ha scritto: >> >>> On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo >> >>> <alessandro.desa...@roma1.infn.it> wrote: >> >>>> Hi, >> >>>> >> >>>> after the upgrade to luminous 12.2.6 today, all our MDSes have been >> >>>> marked as damaged. Trying to restart the instances only result in >> >>>> standby MDSes. We currently have 2 filesystems active and 2 MDSes each. >> >>>> >> >>>> I found the following error messages in the mon: >> >>>> >> >>>> >> >>>> mds.0 <node1_IP>:6800/2412911269 down:damaged >> >>>> mds.1 <node2_IP>:6800/830539001 down:damaged >> >>>> mds.0 <node3_IP>:6800/4080298733 down:damaged >> >>>> >> >>>> >> >>>> Whenever I try to force the repaired state with ceph mds repaired >> >>>> <fs_name>:<rank> I get something like this in the MDS logs: >> >>>> >> >>>> >> >>>> 2018-07-11 13:20:41.597970 7ff7e010e700 0 mds.1.journaler.mdlog(ro) >> >>>> error getting journal off disk >> >>>> 2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log >> >>>> [ERR] : Error recovering journal 0x201: (5) Input/output error >> >>> An EIO reading the journal header is pretty scary. The MDS itself >> >>> probably can't tell you much more about this: you need to dig down >> >>> into the RADOS layer. Try reading the 200.00000000 object (that >> >>> happens to be the rank 0 journal header, every CephFS filesystem >> >>> should have one) using the `rados` command line tool. >> >>> >> >>> John >> >>> >> >>> >> >>> >> >>>> Any attempt of running the journal export results in errors, like this >> >>>> one: >> >>>> >> >>>> >> >>>> cephfs-journal-tool --rank=cephfs:0 journal export backup.bin >> >>>> Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1 >> >>>> Header 200.00000000 is unreadable >> >>>> >> >>>> 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not >> >>>> readable, attempt object-by-object dump with `rados` >> >>>> >> >>>> >> >>>> Same happens for recover_dentries >> >>>> >> >>>> cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary >> >>>> Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header >> >>>> 200.00000000 is unreadable >> >>>> Errors: >> >>>> 0 >> >>>> >> >>>> Is there something I could try to do to have the cluster back? >> >>>> >> >>>> I was able to dump the contents of the metadata pool with rados export >> >>>> -p cephfs_metadata <filename> and I'm currently trying the procedure >> >>>> described in >> >>>> http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery >> >>>> but I'm not sure if it will work as it's apparently doing nothing at the >> >>>> moment (maybe it's just very slow). >> >>>> >> >>>> Any help is appreciated, thanks! >> >>>> >> >>>> >> >>>> Alessandro >> >>>> >> >>>> _______________________________________________ >> >>>> ceph-users mailing list >> >>>> ceph-users@lists.ceph.com >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com