Hi,
So I am trying to remove OSDs from one of our 6 ceph OSDs, this is a
brand new cluster and no data is yet on it. I was following the manual
procedure[1] with the following script. I removed OSDs 0-3 but I am
seeing ceph not fully recovering.
#!/bin/bash
ceph osd out ${1}
/etc/init.d/ceph stop osd.${1}
ceph osd crush remove osd.${1}
ceph auth del osd.${1}
ceph osd rm ${1}
# ceph -v
ceph version 0.72.2-2-g5169d4e (5169d4e957791533e6c1c1aa83c15486d0e7afea)
# ceph status
cluster 7bdc37df-978c-4ddd-a3d4-97a06fc2b016
health HEALTH_WARN 8 pgs stuck inactive; 8 pgs stuck unclean
monmap e2: 3 mons at
{objmon00=192.168.22.30:6789/0,objmon01=192.168.22.31:6789/0,objmon02=192.168.22.32:6789/0},
election epoch 18, quorum 0,1,2 objmon00,objmon01,objmon02
osdmap e5567: 140 osds: 128 up, 128 in
pgmap v151073: 7612 pgs, 18 pools, 90085 kB data, 3244 objects
15978 MB used, 465 TB / 465 TB avail
8 inactive
7604 active+clean
# ceph health detail
HEALTH_WARN 8 pgs stuck inactive; 8 pgs stuck unclean
pg 19.10a is stuck inactive since forever, current state inactive, last
acting [7,99,20]
pg 19.882 is stuck inactive since forever, current state inactive, last
acting [9,46,64]
pg 19.124a is stuck inactive since forever, current state inactive, last
acting [82,108,14]
pg 19.10db is stuck inactive for 1150.820893, current state inactive,
last acting [63,54,72]
pg 19.7a2 is stuck inactive for 1150.868763, current state inactive,
last acting [18,75,122]
pg 19.30a is stuck inactive for 1150.713369, current state inactive,
last acting [107,75,16]
pg 19.142a is stuck inactive for 1229.702841, current state inactive,
last acting [119,16,74]
pg 19.758 is stuck inactive for 1230.207810, current state inactive,
last acting [23,136,81]
pg 19.10a is stuck unclean since forever, current state inactive, last
acting [7,99,20]
pg 19.882 is stuck unclean since forever, current state inactive, last
acting [9,46,64]
pg 19.124a is stuck unclean since forever, current state inactive, last
acting [82,108,14]
pg 19.10db is stuck unclean for 1150.821256, current state inactive,
last acting [63,54,72]
pg 19.7a2 is stuck unclean for 1150.869125, current state inactive, last
acting [18,75,122]
pg 19.30a is stuck unclean for 1150.713731, current state inactive, last
acting [107,75,16]
pg 19.142a is stuck unclean for 1229.703203, current state inactive,
last acting [119,16,74]
pg 19.758 is stuck unclean for 1230.208172, current state inactive, last
acting [23,136,81]
# ceph pg 19.10a query
{ "state": "inactive",
"epoch": 5567,
"up": [
7,
99,
20],
"acting": [
7,
99,
20],
"info": { "pgid": "19.10a",
...
"recovery_state": [
{ "name": "Started\/Primary\/Peering\/WaitActingChange",
"enter_time": "2014-01-30 15:53:10.229039",
"comment": "waiting for pg acting set to change"},
{ "name": "Started",
"enter_time": "2014-01-30 15:53:10.207856"}]}
What does "waiting for pg acting set to change" this only has a single
worthwhile hit on google for a bug a year old? I have no data at risk
on this cluster.
[1] - http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
Thanks,
derek
--
Derek T. Yarnell
University of Maryland
Institute for Advanced Computer Studies
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com