Re: [ceph-users] Still seing scrub errors in .80.5

Gregory Farnum Tue, 16 Sep 2014 11:41:23 -0700

Ah, you're right — it wasn't popping up in the same searches and I'd
forgotten that was so recent.


In that case, did you actually deep scrub *everything* in the cluster,
Marc? You'll need to run and fix every PG in the cluster, and the
background deep scrubbing doesn't move through the data very quickly.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Sep 16, 2014 at 11:32 AM, Dan Van Der Ster
<[email protected]> wrote:
> Hi Greg,
> I believe Marc is referring to the corruption triggered by set_extsize on
> xfs. That option was disabled by default in 0.80.4... See the thread
> "firefly scrub error".
> Cheers,
> Dan
>
>
>
> From: Gregory Farnum <[email protected]>
> Sent: Sep 16, 2014 8:15 PM
> To: Marc
> Cc: [email protected]
> Subject: Re: [ceph-users] Still seing scrub errors in .80.5
>
> On Tue, Sep 16, 2014 at 12:03 AM, Marc <[email protected]> wrote:
>> Hello fellow cephalopods,
>>
>> every deep scrub seems to dig up inconsistencies (i.e. scrub errors)
>> that we could use some help with diagnosing.
>>
>> I understand there used to be a data corruption issue before .80.3 so we
>> made sure that all the nodes were upgraded to .80.5 and all the daemons
>> were restarted (they all report .80.5 when contacted via socket).
>> *After* that we ran a deep scrub, which obviously found errors, which we
>> then repaired. But unfortunately, it's now a week later, and the next
>> deep scrub has dug up new errors, which shouldn't have happened I
>> think...?
>>
>> ceph.log shows these errors in between the deep scrub messages:
>>
>> 2014-09-15 07:56:23.164818 osd.15 10.10.10.55:6804/23853 364 : [ERR]
>> 3.335 shard 2: soid
>> 6ba68735/rbd_data.59e3c2ae8944a.00000000000006b1/head//3 digest
>> 3090820441 != known digest 3787996302
>> 2014-09-15 07:56:23.164827 osd.15 10.10.10.55:6804/23853 365 : [ERR]
>> 3.335 shard 6: soid
>> 6ba68735/rbd_data.59e3c2ae8944a.00000000000006b1/head//3 digest
>> 3259686791 != known digest 3787996302
>> 2014-09-15 07:56:28.485713 osd.15 10.10.10.55:6804/23853 366 : [ERR]
>> 3.335 deep-scrub 0 missing, 1 inconsistent objects
>> 2014-09-15 07:56:28.485734 osd.15 10.10.10.55:6804/23853 367 : [ERR]
>> 3.335 deep-scrub 2 errors
>
> Uh, I'm afraid those errors were never output as a result of bugs in
> Firefly. These are indicating actual data differences between the
> nodes, whereas the Firefly issue was a metadata flag that wasn't
> handled properly in mixed-version OSD clusters.
>
> I don't think Ceph has ever had a bug that would change the data
> payload between OSDs. Searching the tracker logs, the only entries
> with this error message are:
> 1) The local filesystem is not misbehaving under the workload we give
> it (and there are no known filesystem issues that are exposed by
> running firefly OSDs in default config that I can think of — certainly
> none with this error)
> 2) The disks themselves are bad.
>
> :/
>
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Still seing scrub errors in .80.5

Reply via email to