Re: [ceph-users] CRUSH rule seems to work fine not for all PGs in erasure coded pools

Denes Dolhay Thu, 30 Nov 2017 07:46:41 -0800

As per your ceph status it seems that you have 19 pools, all of them areerasure coded as 3+2?

It seems that when you taken the node offline ceph could move some ofthe PGs to other nodes (it seems that that one or more pools does notrequire all 5 osds to be healty. Maybe they are replicated, or not 3+2erasure coded?)

Theese pgs are the active+clean+remapped. (Ceph could successfully puttheese on other osds to maintain the replica count / erasure codingprofile, and this remapping process completed)

Some other pgs do seem to require all 5 osds to be present, these arethe "undersized" ones.

One other thing, if your failure domain is osd and not host or a largerunit, then Ceph will not try to place all replicas on different servers,just different osds, hence it can satisfy the criteria even if one ofthe hosts are down. This setting would be highly inadvisable on aproduction system!



Denes.

On 11/30/2017 02:45 PM, David Turner wrote:

active+clean+remapped is not a healthy state for a PG. If it actuallywe're going to a new osd it would say backfill+wait or backfilling andeventually would get back to active+clean.

I'm not certain what the active+clean+remapped state means. Perhaps aPG query, PG dump, etc can give more insight. In any case, this is nota healthy state and you're still testing removing a node to have lessthan you need to be healthy.

On Thu, Nov 30, 2017, 5:38 AM Jakub Jaszewski<[email protected] <mailto:[email protected]>> wrote:


    I've just did ceph upgrade jewel -> luminous and am facing the
    same case...

    # EC profile
    crush-failure-domain=host
    crush-root=default
    jerasure-per-chunk-alignment=false
    k=3
    m=2
    plugin=jerasure
    technique=reed_sol_van
    w=8

    5 hosts in the cluster and I run systemctl stop ceph.target on one
    of them
    some PGs from EC pool were remapped (active+clean+remappedstate)
    even when there was not enough hosts in the cluster but some are
    still in active+undersized+degradedstate


    root@host01:~# ceph status
      cluster:
        id: a6f73750-1972-47f6-bcf5-a99753be65ad
        health: HEALTH_WARN
                Degraded data redundancy: 876/9115 objects degraded
    (9.611%), 540 pgs unclean, 540 pgs degraded, 540 pgs undersized
      services:
        mon: 3 daemons, quorum host01,host02,host03
        mgr: host01(active), standbys: host02, host03
        osd: 60 osds: 48 up, 48 in; 484 remapped pgs
        rgw: 3 daemons active
      data:
        pools:   19 pools, 3736 pgs
        objects: 1965 objects, 306 MB
        usage:   5153 MB used, 174 TB / 174 TB avail
        pgs:     876/9115 objects degraded (9.611%)
                 2712 active+clean
                 540  active+undersized+degraded
                 484  active+clean+remapped
      io:
        client:   17331 B/s rd, 20 op/s rd, 0 op/s wr
    root@host01:~#



    Anyone here able to explain this behavior to me ?

    Jakub
    _______________________________________________
    ceph-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CRUSH rule seems to work fine not for all PGs in erasure coded pools

Reply via email to