On 07/17/2014 09:44 PM, Caius Howcroft wrote:
I wonder if someone can just clarify something for me.

I have a cluster which I have upgraded to firefly. I'm having pg
inconsistencies due to the recent reported xfs bug. However, I'm
running pg repair X.YYY and I would like to just understand what,
exactly this is doing. It looks like its copying from the primary to
the other two (if size=3), but is it still doing this if the primary
is odd one out? i.e. what happens if the primary get corrupted? I
thought pg repair should fail in this case, but now I'm not so sure.


If the primary OSD is down CRUSH will select a different OSD as the primary.

Ceph doesn't know which object is corrupted. It simply knows that the secondary OSD does not have the same copy as the primary.

btrfs could help here since it has online checksumming, XFS doesn't.

Also is there a way to get the information about which objects and on
which osd are inconsistent, basically the stuff I see in the mon logs
but get it from a json dump  or from admin socket ? I would like to
track these errors better by feeding into our metrics collection.

$ ceph pg <pg id> query

That should tell you more.


Thanks
Caius




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to