In my test environment I changed the reweights of an osd. After this
some PGs get stucked in 'active+remapped' state. I can only repair it by
stepping back to the old value of the reweight.
Here is my ceph tree:
> # id weight type name up/down reweight
> -1 12 root default
> -4 12 room serverroom
> -2 12 host test1
> 0 2 osd.0 up 0.7439
> 1 2 osd.1 up 0.9
> 2 4 osd.2 up 1
> 3 4 osd.3 up 1
> -3 0 host test2
I changed osd.1 from 1.0 to 0.9 and then this happened:
> :# ceph health detail
> HEALTH_WARN 10 pgs stuck unclean; recovery 94/2976 objects misplaced
(3.159%)
> pg 6.4 is stuck unclean for 1135.549938, current state
active+remapped, last acting [1,2,3]
> [...]
ceph pg dump shows the primary OSD of PG 6.4 as not existent (MAXINT). I
do not have any idea what happend here.
Pool 6 is an erasure coded pool (k=2, m=1). Here is the last part of the
query output from the first PG 6.4:
> :# ceph pg 6.4 query
> [...]
> "recovery_state": [
> { "name": "Started\/Primary\/Active",
> "enter_time": "2015-01-03 17:16:10.054846",
> "might_have_unfound": [],
> "recovery_progress": { "backfill_targets": [],
> "waiting_on_backfill": [],
> "last_backfill_started": "0\/\/0\/\/-1",
> "backfill_info": { "begin": "0\/\/0\/\/-1",
> "end": "0\/\/0\/\/-1",
> "objects": []},
> "peer_backfill_info": [],
> "backfills_in_flight": [],
> "recovering": [],
> "pg_backend": { "recovery_ops": [],
> "read_ops": []}},
> "scrub": { "scrubber.epoch_start": "0",
> "scrubber.active": 0,
> "scrubber.block_writes": 0,
> "scrubber.waiting_on": 0,
> "scrubber.waiting_on_whom": []}},
> { "name": "Started",
> "enter_time": "2015-01-03 17:16:09.073069"}],
Any idea what happened or have I done anything wrong here?
Greetings!
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com