I ran a ceph osd reweight-by-utilization yesterday and partway through
had a network interruption. After the network was restored the cluster
continued to rebalance but this morning the cluster has stopped
rebalance and status will not change from:
# ceph status
cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c
health HEALTH_WARN
1 pgs degraded
1 pgs stuck degraded
2 pgs stuck unclean
1 pgs stuck undersized
1 pgs undersized
recovery 8163/66089054 objects degraded (0.012%)
recovery 8194/66089054 objects misplaced (0.012%)
monmap e24: 3 mons at
{mon1=10.0.231.53:6789/0,mon2=10.0.231.54:6789/0,mon3=10.0.231.55:6789/0}
election epoch 250, quorum 0,1,2 mon1,mon2,mon3
osdmap e184486: 100 osds: 100 up, 100 in; 1 remapped pgs
pgmap v3010985: 4144 pgs, 7 pools, 125 TB data, 32270 kobjects
251 TB used, 111 TB / 363 TB avail
8163/66089054 objects degraded (0.012%)
8194/66089054 objects misplaced (0.012%)
4142 active+clean
1 active+undersized+degraded
1 active+remapped
# ceph health detail
HEALTH_WARN 1 pgs degraded; 1 pgs stuck degraded; 2 pgs stuck unclean;
1 pgs stuck undersized; 1 pgs undersized; recovery 8163/66089054
objects degraded (0.012%); recovery 8194/66089054 objects misplaced
(0.012%)
pg 2.e7f is stuck unclean for 65125.554509, current state
active+remapped, last acting [58,5]
pg 2.782 is stuck unclean for 65140.681540, current state
active+undersized+degraded, last acting [76]
pg 2.782 is stuck undersized for 60568.221461, current state
active+undersized+degraded, last acting [76]
pg 2.782 is stuck degraded for 60568.221549, current state
active+undersized+degraded, last acting [76]
pg 2.782 is active+undersized+degraded, acting [76]
recovery 8163/66089054 objects degraded (0.012%)
recovery 8194/66089054 objects misplaced (0.012%)
# ceph pg 2.e7f query
"recovery_state": [
{
"name": "Started\/Primary\/Active",
"enter_time": "2015-08-11 15:43:09.190269",
"might_have_unfound": [],
"recovery_progress": {
"backfill_targets": [],
"waiting_on_backfill": [],
"last_backfill_started": "0\/\/0\/\/-1",
"backfill_info": {
"begin": "0\/\/0\/\/-1",
"end": "0\/\/0\/\/-1",
"objects": []
},
"peer_backfill_info": [],
"backfills_in_flight": [],
"recovering": [],
"pg_backend": {
"pull_from_peer": [],
"pushing": []
}
},
"scrub": {
"scrubber.epoch_start": "0",
"scrubber.active": 0,
"scrubber.waiting_on": 0,
"scrubber.waiting_on_whom": []
}
},
{
"name": "Started",
"enter_time": "2015-08-11 15:43:04.955796"
}
],
# ceph pg 2.782 query
"recovery_state": [
{
"name": "Started\/Primary\/Active",
"enter_time": "2015-08-11 15:42:42.178042",
"might_have_unfound": [
{
"osd": "5",
"status": "not queried"
}
],
"recovery_progress": {
"backfill_targets": [],
"waiting_on_backfill": [],
"last_backfill_started": "0\/\/0\/\/-1",
"backfill_info": {
"begin": "0\/\/0\/\/-1",
"end": "0\/\/0\/\/-1",
"objects": []
},
"peer_backfill_info": [],
"backfills_in_flight": [],
"recovering": [],
"pg_backend": {
"pull_from_peer": [],
"pushing": []
}
},
"scrub": {
"scrubber.epoch_start": "0",
"scrubber.active": 0,
"scrubber.waiting_on": 0,
"scrubber.waiting_on_whom": []
}
},
{
"name": "Started",
"enter_time": "2015-08-11 15:42:41.139709"
}
],
"agent_state": {}
I tried restarted osd.5/58/76 but no change.
Any suggestions?
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com