Re: [ceph-users] lost osd while migrating EC pool to device-class crush rules

Graham Allan Tue, 18 Sep 2018 15:24:29 -0700

On 09/17/2018 04:33 PM, Gregory Farnum wrote:

On Mon, Sep 17, 2018 at 8:21 AM Graham Allan <g...@umn.edu<mailto:g...@umn.edu>> wrote:
    Looking back through history it seems that I *did* override the
    min_size
    for this pool, however I didn't reduce it - it used to have min_size 2!
    That made no sense to me - I think it must be an artifact of a very
    early (hammer?) ec pool creation, but it pre-dates me.

    I found the documentation on what min_size should be a bit confusing
    which is how I arrived at 4. Fully agree that k+1=5 makes way more
    sense.

    I don't think I was the only one confused by this though, eg
    http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026445.html

    I suppose the safest thing to do is update min_size->5 right away to
    force any size-4 pgs down until they can perform recovery. I can set
    force-recovery on these as well...
Mmm, this is embarrassing but that actually doesn't quite work due tohttps://github.com/ceph/ceph/pull/24095, which has been on my task listbut at the bottom for a while. :( So if your cluster is stable now I'dlet it clean up and then change the min_size once everything is repaired.

Thanks for your feedback, Greg. Since declaring the dead osd as lost,the downed pg became active again, and is successfully serving data. Thecluster is considerably more stable now; I've set force-backfill orforce-recovery on any size=4 pgs and can wait for that to completebefore changing anything else.


Thanks again,

Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] lost osd while migrating EC pool to device-class crush rules

Reply via email to