Ah, I should have mentioned--size=3, min_size=1. I'm pretty sure that 'down_osds_we_would_probe' is the problem, but it's not clear if there's a way to fix that.
On Tue, Feb 9, 2016 at 11:30 PM Arvydas Opulskis < arvydas.opuls...@adform.com> wrote: > Hi, > > > > What is min_size for this pool? Maybe you need to decrease it for cluster > to start recovering. > > > > Arvydas > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *Scott Laird > *Sent:* Wednesday, February 10, 2016 7:22 AM > *To:* 'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com) < > ceph-users@lists.ceph.com> > *Subject:* [ceph-users] Can't fix down+incomplete PG > > > > I lost a few OSDs recently. Now my cell is unhealthy and I can't figure > out how to get it healthy again. > > > > OSD 3, 7, 10, and 40 died in a power outage. Now I have 10 PGs that are > down+incomplete, but all of them seem like they should have surviving > replicas of all data. > > > > I'm running 9.2.0. > > > > $ ceph health detail | grep down > > pg 18.c1 is down+incomplete, acting [11,18,9] > > pg 18.47 is down+incomplete, acting [11,9,22] > > pg 18.1d7 is down+incomplete, acting [5,31,24] > > pg 18.1d6 is down+incomplete, acting [22,11,5] > > pg 18.2af is down+incomplete, acting [19,24,18] > > pg 18.2dd is down+incomplete, acting [15,11,22] > > pg 18.2de is down+incomplete, acting [15,17,11] > > pg 18.3e is down+incomplete, acting [25,8,18] > > pg 18.3d6 is down+incomplete, acting [22,39,24] > > pg 18.3e6 is down+incomplete, acting [9,23,8] > > > > $ ceph pg 18.c1 query > > { > > "state": "down+incomplete", > > "snap_trimq": "[]", > > "epoch": 960905, > > "up": [ > > 11, > > 18, > > 9 > > ], > > "acting": [ > > 11, > > 18, > > 9 > > ], > > "info": { > > "pgid": "18.c1", > > "last_update": "0'0", > > "last_complete": "0'0", > > "log_tail": "0'0", > > "last_user_version": 0, > > "last_backfill": "MAX", > > "last_backfill_bitwise": 0, > > "purged_snaps": "[]", > > "history": { > > "epoch_created": 595523, > > "last_epoch_started": 954170, > > "last_epoch_clean": 954170, > > "last_epoch_split": 0, > > "last_epoch_marked_full": 0, > > "same_up_since": 959988, > > "same_interval_since": 959988, > > "same_primary_since": 959988, > > "last_scrub": "613947'7736", > > "last_scrub_stamp": "2015-11-11 21:18:35.118057", > > "last_deep_scrub": "613947'7736", > > "last_deep_scrub_stamp": "2015-11-11 21:18:35.118057", > > "last_clean_scrub_stamp": "2015-11-11 21:18:35.118057" > > }, > > ... > > "probing_osds": [ > > "9", > > "11", > > "18", > > "23", > > "25" > > ], > > "down_osds_we_would_probe": [ > > 7, > > 10 > > ], > > "peering_blocked_by": [] > > }, > > { > > "name": "Started", > > "enter_time": "2016-02-09 20:35:57.627376" > > } > > ], > > "agent_state": {} > > } > > > > I tried replacing disks. I created a new OSD 3 and 7 but neither will > start up; the ceph-osd task starts but never actually makes it to 'up' with > nothing obvious in the logs. I can post logs if that helps. Since the > OSDs were removed a few days ago, 'ceph osd lost' doesn't seem to help. > > > > Is there a way to fix these PGs and get my cluster healthy again? > > > > > > Scott >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com