On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum <gfar...@redhat.com> wrote:
>
> On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo 
> <alessandro.desa...@roma1.infn.it> wrote:
>>
>> OK, I found where the object is:
>>
>>
>> ceph osd map cephfs_metadata 200.00000000
>> osdmap e632418 pool 'cephfs_metadata' (10) object '200.00000000' -> pg
>> 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23)
>>
>>
>> So, looking at the osds 23, 35 and 18 logs in fact I see:
>>
>>
>> osd.23:
>>
>> 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log
>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
>> 10:292cf221:::200.00000000:head
>>
>>
>> osd.35:
>>
>> 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log
>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
>> 10:292cf221:::200.00000000:head
>>
>>
>> osd.18:
>>
>> 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log
>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
>> 10:292cf221:::200.00000000:head
>>
>>
>> So, basically the same error everywhere.
>>
>> I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may
>> help.
>>
>> No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and
>> no disk problems anywhere. No relevant errors in syslogs, the hosts are
>> just fine. I cannot exclude an error on the RAID controllers, but 2 of
>> the OSDs with 10.14 are on a SAN system and one on a different one, so I
>> would tend to exclude they both had (silent) errors at the same time.
>
>
> That's fairly distressing. At this point I'd probably try extracting the 
> object using ceph-objectstore-tool and seeing if it decodes properly as an 
> mds journal. If it does, you might risk just putting it back in place to 
> overwrite the crc.
>

Wouldn't it be easier to scrub repair the PG to fix the crc?

Alessandro, did you already try a deep-scrub on pg 10.14? I expect
it'll show an inconsistent object. Though, I'm unsure if repair will
correct the crc given that in this case *all* replicas have a bad crc.

--Dan

> However, I'm also quite curious how it ended up that way, with a checksum 
> mismatch but identical data (and identical checksums!) across the three 
> replicas. Have you previously done some kind of scrub repair on the metadata 
> pool? Did the PG perhaps get backfilled due to cluster changes?
> -Greg
>
>>
>>
>> Thanks,
>>
>>
>>      Alessandro
>>
>>
>>
>> Il 11/07/18 18:56, John Spray ha scritto:
>> > On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo
>> > <alessandro.desa...@roma1.infn.it> wrote:
>> >> Hi John,
>> >>
>> >> in fact I get an I/O error by hand too:
>> >>
>> >>
>> >> rados get -p cephfs_metadata 200.00000000 200.00000000
>> >> error getting cephfs_metadata/200.00000000: (5) Input/output error
>> > Next step would be to go look for corresponding errors on your OSD
>> > logs, system logs, and possibly also check things like the SMART
>> > counters on your hard drives for possible root causes.
>> >
>> > John
>> >
>> >
>> >
>> >>
>> >> Can this be recovered someway?
>> >>
>> >> Thanks,
>> >>
>> >>
>> >>       Alessandro
>> >>
>> >>
>> >> Il 11/07/18 18:33, John Spray ha scritto:
>> >>> On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo
>> >>> <alessandro.desa...@roma1.infn.it> wrote:
>> >>>> Hi,
>> >>>>
>> >>>> after the upgrade to luminous 12.2.6 today, all our MDSes have been
>> >>>> marked as damaged. Trying to restart the instances only result in
>> >>>> standby MDSes. We currently have 2 filesystems active and 2 MDSes each.
>> >>>>
>> >>>> I found the following error messages in the mon:
>> >>>>
>> >>>>
>> >>>> mds.0 <node1_IP>:6800/2412911269 down:damaged
>> >>>> mds.1 <node2_IP>:6800/830539001 down:damaged
>> >>>> mds.0 <node3_IP>:6800/4080298733 down:damaged
>> >>>>
>> >>>>
>> >>>> Whenever I try to force the repaired state with ceph mds repaired
>> >>>> <fs_name>:<rank> I get something like this in the MDS logs:
>> >>>>
>> >>>>
>> >>>> 2018-07-11 13:20:41.597970 7ff7e010e700  0 mds.1.journaler.mdlog(ro)
>> >>>> error getting journal off disk
>> >>>> 2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log
>> >>>> [ERR] : Error recovering journal 0x201: (5) Input/output error
>> >>> An EIO reading the journal header is pretty scary.  The MDS itself
>> >>> probably can't tell you much more about this: you need to dig down
>> >>> into the RADOS layer.  Try reading the 200.00000000 object (that
>> >>> happens to be the rank 0 journal header, every CephFS filesystem
>> >>> should have one) using the `rados` command line tool.
>> >>>
>> >>> John
>> >>>
>> >>>
>> >>>
>> >>>> Any attempt of running the journal export results in errors, like this 
>> >>>> one:
>> >>>>
>> >>>>
>> >>>> cephfs-journal-tool --rank=cephfs:0 journal export backup.bin
>> >>>> Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1
>> >>>> Header 200.00000000 is unreadable
>> >>>>
>> >>>> 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not
>> >>>> readable, attempt object-by-object dump with `rados`
>> >>>>
>> >>>>
>> >>>> Same happens for recover_dentries
>> >>>>
>> >>>> cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
>> >>>> Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header
>> >>>> 200.00000000 is unreadable
>> >>>> Errors:
>> >>>> 0
>> >>>>
>> >>>> Is there something I could try to do to have the cluster back?
>> >>>>
>> >>>> I was able to dump the contents of the metadata pool with rados export
>> >>>> -p cephfs_metadata <filename> and I'm currently trying the procedure
>> >>>> described in
>> >>>> http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery
>> >>>> but I'm not sure if it will work as it's apparently doing nothing at the
>> >>>> moment (maybe it's just very slow).
>> >>>>
>> >>>> Any help is appreciated, thanks!
>> >>>>
>> >>>>
>> >>>>        Alessandro
>> >>>>
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list
>> >>>> ceph-users@lists.ceph.com
>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to