Hello,

i have this error message:

2018-01-25 00:59:27.357916 7fd646ae1700 -1 osd.3 pg_epoch: 9393 pg[9.139s0(
v 8799'82397 (5494'79049,8799'82397] local-lis/les=9392/9393 n=10003
ec=1478/1478 lis/c 9392/6304 les/c/f 9393/6307/807 9391/9392/9392)
[3,6,12,9]/[3,6,2147483647,4] r=0 lpr=9392 pi=[6304,9392)/3 bft=9(3),12(2)
crt=8799'82397 lcod 0'0 mlcod 0'0
active+undersized+degraded+remapped+backfilling] recover_replicas: object
added to missing set for backfill, but is not in recovering, error!

in a 3+1 ec pool, and when i enable backfills, the osd starts dying on
recovery, which makes the whole cluster flail around. And while the cluster
works with this one degraded and remapped pg, not being able to switch on
backfills limits my options severely.

Now, I kind of can guess why this is happening (I had to zero out some
sectors on a broken harddisk to recover what was left of an already
degraded EC pg that I messed up by editing the crush map and removing an
OSD in the wrong order - i think), but how can I fix it?

The only other incidence I can find of this is a list post that also did
not get any resolution.

Since this a cluster that's used for cephfs, and the files on it are
actually recoverable from a different source, if I could find out which
files the broken objects belong to, could I just delete/rewrite those files
to fix that issue?

Also, how do I find out which objects are the problem? Or can I only deal
with this in terms of a whole pg?

Any help to resolve this issue or insight about how to read that logline
would be appreciated!

thanks,
Philip
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to