[ceph-users] Re: PGS INCONSISTENT - read_error - replace disk or pg repair then replace disk

Anthony D'Atri Sat, 23 May 2020 17:20:30 -0700

Historically I have often but not always found that removing / destroying the 
affected OSD would clear the inconsistent PG.  At one point the logged message 
was clear about who reported and who was the perp, but then a later release 
broke that.  Not sure what recent releases say, since with Luminous I rarely 
saw them.   Perhaps HDD behavior is more conducive.

Depending on your device, unrecovered read errors may not warrant replacement — 
they often represent routine slipped/ reallocated blocks.  In such cases 
rewriting the data is sufficient.  With older releases redeploying the OSD ( or 
surgically excising affected data ) would suffice.  

With Nautilus I’m told that ceph-osd ( or BlueStore? ) will rewrite 
automagically and the OSD will not need to be reprovisioned.   It would still 
be a good idea to keep an eye on escalating rates of reallocation / dwindling 
spares | percentage spares remaining.  One SSD mfg told me that when remaining 
spares get down to 13% performance will be impacted by 10% and the drive should 
be considered about to fail.  

I’ve seen both an HDD model and an SSD model with design / firmware flaws that 
were tickled by specific Ceph access patterns, so if you experience a pandemic 
there may be more to it.  

> On May 23, 2020, at 3:18 AM, Massimo Sgaravatto 
> <[email protected]> wrote:
> 
> When I see this problem usually:
> 
> - I run pg repair
> - I remove the OSD from the cluster
> - I replace the disk
> - I recreate the OSD on the new disk
> 
> Cheers, Massimo
> 
>> On Wed, May 20, 2020 at 9:41 PM Peter Lewis <[email protected]> wrote:
>> 
>> Hello,
>> 
>> I  came across a section of the documentation that I don't quite
>> understand.  In the section about inconsistent PGs it says if one of the
>> shards listed in `rados list-inconsistent-obj` has a read_error the disk is
>> probably bad.
>> 
>> Quote from documentation:
>> 
>> https://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent
>> `If read_error is listed in the errors attribute of a shard, the
>> inconsistency is likely due to disk errors. You might want to check your
>> disk used by that OSD.`
>> 
>> I determined that the disk is bad by looking at the output of smartctl.  I
>> would think that replacing the disk by removing the OSD from the cluster
>> and allowing the cluster to recover would fix this inconsistency error
>> without having to run `ceph pg repair`.
>> 
>> Can I just replace the OSD and the inconsistency will be resolved by the
>> recovery?  Or would it be better to run `ceph pg repair` and then replace
>> the OSD associated with that bad disk?
>> 
>> Thanks!
>> _______________________________________________
>> ceph-users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> 
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: PGS INCONSISTENT - read_error - replace disk or pg repair then replace disk

Reply via email to