Hi,

I have a production cluster on which 1 OSD on a failing disk was slowing
the whole cluster down. I removed the OSD (osd.87) like usual in such case
but this time it resulted in 17 unfound objects. I no longer have the files
from osd.87. I was able to call "ceph pg PGID mark_unfound_lost delete" on
10 of those objects.

On the remaining objects 7 the command blocks. When I try to do "ceph pg
PGID query" on this PG it also blocks. I suspect this is same reason why
mark_unfound blocks.

Other client IO to PGs that have unfound objects are also blocked. When
trying to query the OSDs which has the PG with unfound objects, "ceph tell"
blocks.

I tried to mark the PG as complete using ceph-objectstore-tool but it did
not help as the PG is in fact complete but for some reason blocks.

I tried recreating an empty osd.87 and importing the PG exported from other
replica but it did not help.

Can someone help me please? This is really important.

ceph pg dump:
https://gist.github.com/anonymous/c0622ef0d8c0ac84e0778e73bad3c1af/raw/206a06e674ed1c870bbb09bb75fe4285a8e20ba4/pg-dump

ceph osd dump:
https://gist.github.com/anonymous/64e237d85016af6bd7879ef272ca5639/raw/d6fceb9acd206b75c3ce59c60bcd55a47dea7acd/osd-dump

ceph health detail:
https://gist.github.com/anonymous/ddb27863ecd416748ebd7ebbc036e438/raw/59ef1582960e011f10cbdbd4ccee509419b95d4e/health-detail


-- 
Pozdrawiam,
Tomasz Kuzemko
tom...@kuzemko.net
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to