Hello,
We have an issue on one of our clusters. One node with 9 OSD was down
for more than 12 hours. During that time cluster recovered without
problems. When host back to the cluster we got two PGs in incomplete
state. We decided to mark OSDs on this host as out but the two PGs are
still in incomplete state. Trying to query those pg hangs forever. We
were alredy trying restarting OSDs. Is there any way to solve this issue
without loosing data? Any help appreciate :)
# ceph health detail | grep incomplete
HEALTH_WARN 2 pgs incomplete; 2 pgs stuck inactive; 2 pgs stuck unclean;
200 requests are blocked > 32 sec; 2 osds have slow requests;
noscrub,nodeep-scrub flag(s) set
pg 3.2929 is stuck inactive since forever, current state incomplete,
last acting [109,272,83]
pg 3.1683 is stuck inactive since forever, current state incomplete,
last acting [166,329,281]
pg 3.2929 is stuck unclean since forever, current state incomplete, last
acting [109,272,83]
pg 3.1683 is stuck unclean since forever, current state incomplete, last
acting [166,329,281]
pg 3.1683 is incomplete, acting [166,329,281] (reducing pool vms
min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 3.2929 is incomplete, acting [109,272,83] (reducing pool vms min_size
from 2 may help; search ceph.com/docs for 'incomplete')
Directory for PG 3.1683 is present on OSD 166 and containes ~8GB.
We didn't try setting min_size to 1 yet (we treat is as a last resort).
Some cluster info:
# ceph --version
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
# ceph -s
health HEALTH_WARN
2 pgs incomplete
2 pgs stuck inactive
2 pgs stuck unclean
200 requests are blocked > 32 sec
noscrub,nodeep-scrub flag(s) set
monmap e7: 5 mons at
{mon-03=*.2:6789/0,mon-04=*.36:6789/0,mon-05=*.81:6789/0,mon-06=*.0:6789/0,mon-07=*.40:6789/0}
election epoch 3250, quorum 0,1,2,3,4
mon-06,mon-07,mon-04,mon-03,mon-05
osdmap e613040: 346 osds: 346 up, 337 in
flags noscrub,nodeep-scrub
pgmap v27163053: 18624 pgs, 6 pools, 138 TB data, 39062 kobjects
415 TB used, 186 TB / 601 TB avail
18622 active+clean
2 incomplete
client io 9992 kB/s rd, 64867 kB/s wr, 8458 op/s
# ceph osd pool get vms pg_num
pg_num: 16384
# ceph osd pool get vms size
size: 3
# ceph osd pool get vms min_size
min_size: 2
--
PS
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com