Ah, I should have mentioned--size=3, min_size=1.

I'm pretty sure that 'down_osds_we_would_probe' is the problem, but it's
not clear if there's a way to fix that.



On Tue, Feb 9, 2016 at 11:30 PM Arvydas Opulskis <
arvydas.opuls...@adform.com> wrote:

> Hi,
>
>
>
> What is min_size for this pool? Maybe you need to decrease it for cluster
> to start recovering.
>
>
>
> Arvydas
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Scott Laird
> *Sent:* Wednesday, February 10, 2016 7:22 AM
> *To:* 'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com) <
> ceph-users@lists.ceph.com>
> *Subject:* [ceph-users] Can't fix down+incomplete PG
>
>
>
> I lost a few OSDs recently.  Now my cell is unhealthy and I can't figure
> out how to get it healthy again.
>
>
>
> OSD 3, 7, 10, and 40 died in a power outage.  Now I have 10 PGs that are
> down+incomplete, but all of them seem like they should have surviving
> replicas of all data.
>
>
>
> I'm running 9.2.0.
>
>
>
> $ ceph health detail | grep down
>
> pg 18.c1 is down+incomplete, acting [11,18,9]
>
> pg 18.47 is down+incomplete, acting [11,9,22]
>
> pg 18.1d7 is down+incomplete, acting [5,31,24]
>
> pg 18.1d6 is down+incomplete, acting [22,11,5]
>
> pg 18.2af is down+incomplete, acting [19,24,18]
>
> pg 18.2dd is down+incomplete, acting [15,11,22]
>
> pg 18.2de is down+incomplete, acting [15,17,11]
>
> pg 18.3e is down+incomplete, acting [25,8,18]
>
> pg 18.3d6 is down+incomplete, acting [22,39,24]
>
> pg 18.3e6 is down+incomplete, acting [9,23,8]
>
>
>
> $ ceph pg 18.c1 query
>
> {
>
>     "state": "down+incomplete",
>
>     "snap_trimq": "[]",
>
>     "epoch": 960905,
>
>     "up": [
>
>         11,
>
>         18,
>
>         9
>
>     ],
>
>     "acting": [
>
>         11,
>
>         18,
>
>         9
>
>     ],
>
>     "info": {
>
>         "pgid": "18.c1",
>
>         "last_update": "0'0",
>
>         "last_complete": "0'0",
>
>         "log_tail": "0'0",
>
>         "last_user_version": 0,
>
>         "last_backfill": "MAX",
>
>         "last_backfill_bitwise": 0,
>
>         "purged_snaps": "[]",
>
>         "history": {
>
>             "epoch_created": 595523,
>
>             "last_epoch_started": 954170,
>
>             "last_epoch_clean": 954170,
>
>             "last_epoch_split": 0,
>
>             "last_epoch_marked_full": 0,
>
>             "same_up_since": 959988,
>
>             "same_interval_since": 959988,
>
>             "same_primary_since": 959988,
>
>             "last_scrub": "613947'7736",
>
>             "last_scrub_stamp": "2015-11-11 21:18:35.118057",
>
>             "last_deep_scrub": "613947'7736",
>
>             "last_deep_scrub_stamp": "2015-11-11 21:18:35.118057",
>
>             "last_clean_scrub_stamp": "2015-11-11 21:18:35.118057"
>
>         },
>
> ...
>
>             "probing_osds": [
>
>                 "9",
>
>                 "11",
>
>                 "18",
>
>                 "23",
>
>                 "25"
>
>             ],
>
>             "down_osds_we_would_probe": [
>
>                 7,
>
>                 10
>
>             ],
>
>             "peering_blocked_by": []
>
>         },
>
>         {
>
>             "name": "Started",
>
>             "enter_time": "2016-02-09 20:35:57.627376"
>
>         }
>
>     ],
>
>     "agent_state": {}
>
> }
>
>
>
> I tried replacing disks. I created a new OSD 3 and 7 but neither will
> start up; the ceph-osd task starts but never actually makes it to 'up' with
> nothing obvious in the logs.  I can post logs if that helps.  Since the
> OSDs were removed a few days ago, 'ceph osd lost' doesn't seem to help.
>
>
>
> Is there a way to fix these PGs and get my cluster healthy again?
>
>
>
>
>
> Scott
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to