On 09/17/2018 04:33 PM, Gregory Farnum wrote:
On Mon, Sep 17, 2018 at 8:21 AM Graham Allan <g...@umn.edu <mailto:g...@umn.edu>> wrote:

    Looking back through history it seems that I *did* override the
    min_size
    for this pool, however I didn't reduce it - it used to have min_size 2!
    That made no sense to me - I think it must be an artifact of a very
    early (hammer?) ec pool creation, but it pre-dates me.

    I found the documentation on what min_size should be a bit confusing
    which is how I arrived at 4. Fully agree that k+1=5 makes way more
    sense.

    I don't think I was the only one confused by this though, eg
    http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026445.html

    I suppose the safest thing to do is update min_size->5 right away to
    force any size-4 pgs down until they can perform recovery. I can set
    force-recovery on these as well...


Mmm, this is embarrassing but that actually doesn't quite work due to https://github.com/ceph/ceph/pull/24095, which has been on my task list but at the bottom for a while. :( So if your cluster is stable now I'd let it clean up and then change the min_size once everything is repaired.

Thanks for your feedback, Greg. Since declaring the dead osd as lost, the downed pg became active again, and is successfully serving data. The cluster is considerably more stable now; I've set force-backfill or force-recovery on any size=4 pgs and can wait for that to complete before changing anything else.

Thanks again,

Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to