Dear all,
We're running Ceph Luminous and we've recently hit an issue with some OSD's
(autoout states, IO/CPU overload) which unfortunately resulted with one
placement group with the state "stale+active+clean", it's a placement group
from .rgw.root pool:
1.15 0 0 0 0 0 0 1
1 stale+active+clean 2020-05-11
23:22:51.396288 40'1 2142:152 [3,2,6] 3 [3,2,6]
3 40'1 2020-04-22 00:46:05.904418 40'1 2020-04-20
20:18:13.371396 0
I guess there is no active replica of that object anywhere on the cluster.
Restarting osd.3, osd.2 or osd.6 daemons does not help.
I've used ceph-objectstore-tool and successfully exported placement group from
osd.3, osd.2 and osd.6 and tried to import it on a completely different OSD,
the exports differ in filesize slightly, but the osd.3 wihch was the latest
primary is the biggest so I've tried to import it on a different OSD, when
starting up I see the following (this is from osd.1):
2020-05-14 21:43:19.779740 7f7880ac3700 1 osd.1 pg_epoch: 2459 pg[1.15( v 40'1
(0'0,40'1] local-lis/les=2073/2074 n=0 ec=73/39 lis/c 2073/2073 les/c/f
2074/2074/633 2145/39/2145) [] r=-1 lpr=2455 crt=40'1 lcod 0'0 unknown NOTIFY]
state<Start>: transitioning to Stray
I see from previous pg dumps (several weeks before while it was still
active+clean) that it was 115 bytes with zero objects in it but I am not sure
how to interpret that.
As this is a pg from .rgw.root pool, I cannot get any response from the cluster
when accessing (everything timeouts).
What is the correct course of action with this pg?
Any help would be greatly appriciated.
Thanks,
Tomislav
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]