Hello,
we have had some trouble of osds running full,
even after rebalancing. So at 100% usage and ceph-osds not starting
anymore, we decided to delete some pg directories, after which
rebalancing finished.
However after this, we have the situation that one pg is not
becoming clean anymore.
We tried to
a) stop, stop+out osd.7 -> after rebalancing the pg is still stuck
b) Mark objects lost: root@wein:~# ceph pg 3.14 mark_unfound_lost revert
pg has no unfound objects
c) stop osd.7, rsync the directory 3.14_head from osd.2, start osd.7
d) ceph pg scrub 3.14
So far the status is still that this pg is down.
I have attached some of the lines / logs.
I would be grateful if you can give any hints on how to repair this situation.
Cheers,
Nico
p.s.: Using ceph 0.80.7.
Action causing the problem:
root@wein:/var/lib/ceph/osd/ceph-7/current# ls
0.12_head 0.a_head 2.1c_head 3.2a_head 3.4c_head 3.6b_TEMP 3.8b_head
3.97_TEMP 3.c7_TEMP
0.14_head 1.10_head 2.26_head 3.32_head 3.4c_TEMP 3.6c_head 3.8d_head
3.9b_head 3.c_head
0.21_head 1.1a_head 2.2a_head 3.32_TEMP 3.56_head 3.6c_TEMP 3.8d_TEMP
3.9b_TEMP 3.d_head
0.23_head 1.21_head 2.2e_head 3.37_head 3.56_TEMP 3.6_head 3.8e_head
3.a9_head 3.d_TEMP
0.2b_head 1.2b_head 2.2f_head 3.37_TEMP 3.5b_head 3.7b_head 3.8_head
3.a9_TEMP 3.f_head
0.2d_head 1.2c_head 2.33_head 3.47_head 3.5b_TEMP 3.7b_TEMP 3.91_head
3.ab_TEMP 3.f_TEMP
0.2e_head 1.32_head 2.3f_head 3.47_TEMP 3.60_head 3.80_head 3.91_TEMP
3.b2_TEMP commit_op_seq
0.2_head 1.37_head 2.b_head 3.49_head 3.61_head 3.81_head 3.93_head
3.b7_TEMP meta
0.38_head 1.3c_head 3.0_head 3.49_TEMP 3.61_TEMP 3.82_head 3.93_TEMP
3.bf_head nosnap
0.3b_head 1.e_head 3.12_head 3.4a_head 3.67_head 3.82_TEMP 3.94_head
3.bf_TEMP omap
0.3e_head 2.10_head 3.14_head 3.4a_TEMP 3.67_TEMP 3.89_head 3.94_TEMP
3.b_head
0.7_head 2.15_head 3.14_TEMP 3.4b_head 3.6b_head 3.89_TEMP 3.97_head
3.b_TEMP
root@wein:/var/lib/ceph/osd/ceph-7/current# du -sh 3.14_*
3.9G 3.14_head
4.0K 3.14_TEMP
The current status:
root@kaffee:~# ceph -s
cluster e0611730-09ff-4f3c-bfdb-2dd415274a36
health HEALTH_WARN 1 pgs down; 1 pgs peering; 1 pgs stuck inactive; 1
pgs stuck unclean; 5 requests are blocked > 32 sec
monmap e3: 3 mons at
{kaffee=192.168.40.1:6789/0,tee=192.168.40.2:6789/0,wein=192.168.40.3:6789/0},
election epoch 3652, quorum 0,1,2 kaffee,tee,wein
osdmap e1129: 8 osds: 7 up, 7 in
pgmap v435448: 448 pgs, 4 pools, 976 GB data, 248 kobjects
1938 GB used, 9913 GB / 11852 GB avail
447 active+clean
1 down+peering
root@wein:/var/lib/ceph/osd/ceph-7/current# ceph health detail
HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck unclean; 5
requests are blocked > 32 sec; 1 osds have slow requests
pg 3.14 is stuck inactive for 135697.438689, current state incomplete, last
acting [2,7]
pg 3.14 is stuck unclean for 135697.438702, current state incomplete, last
acting [2,7]
pg 3.14 is incomplete, acting [2,7]
5 ops are blocked > 8388.61 sec
5 ops are blocked > 8388.61 sec on osd.2
1 osds have slow requests
root@wein:~# ceph pg dump_stuck stale
ok
root@wein:~# ceph pg dump_stuck unclean
ok
pg_stat objects mip degr unf bytes log disklog state
state_stamp v reportedup up_primary acting acting_primary
last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
3.14 1006 0 0 0 4135415824 3001 3001
incomplete 2014-12-19 14:40:00.272775 589'27399 1150:66317
[2,7] 2 [2,7] 2 503'24268 2014-12-13 19:17:39.272720
503'24268 2014-12-13 19:17:38.672258
root@wein:~# ceph pg dump_stuck inactive
ok
pg_stat objects mip degr unf bytes log disklog state
state_stamp v reportedup up_primary acting acting_primary
last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
3.14 1006 0 0 0 4135415824 3001 3001
incomplete 2014-12-19 14:40:00.272775 589'27399 1150:66317
[2,7] 2 [2,7] 2 503'24268 2014-12-13 19:17:39.272720
503'24268 2014-12-13 19:17:38.672258
root@wein:~#
root@wein:~# ceph osd tree
# id weight type name up/down reweight
-1 2.3 root default
-2 0.2999 host wein
0 0.04999 osd.0 up 1
3 0.04999 osd.3 up 1
4 0.04999 osd.4 up 1
5 0.04999 osd.5 up 1
6 0.04999 osd.6 up 1
7 0.04999 osd.7 up 1
-3 1 host tee
1 5.5 osd.1 up 1
-4 1 host kaffee
2 5.5 osd.2 up 1
root@wein:~#
Fixes we tried:
root@wein:~# ceph pg 3.14 mark_unfound_lost revert
pg has no unfound objects
root@kaffee:~# rsync -av /var/lib/ceph/osd/ceph-2/current/3.14_head/
[email protected]:/var/lib/ceph/osd/ceph-7/current/3.14_head/
+ stop & restart osd.7 around it
root@wein:~# ceph pg deep-scrub 3.14
instructing pg 3.14 on osd.2 to deep-scrub
--
New PGP key: 659B 0D91 E86E 7E24 FD15 69D0 C729 21A1 293F 2D24
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com