As per your ceph status it seems that you have 19 pools, all of them are erasure coded as 3+2?

It seems that when you taken the node offline ceph could move some of the PGs to other nodes (it seems that that one or more pools does not require all 5 osds to be healty. Maybe they are replicated, or not 3+2 erasure coded?)

Theese pgs are the active+clean+remapped. (Ceph could successfully put theese on other osds to maintain the replica count / erasure coding profile, and this remapping process completed)

Some other pgs do seem to require all 5 osds to be present, these are the "undersized" ones.


One other thing, if your failure domain is osd and not host or a larger unit, then Ceph will not try to place all replicas on different servers, just different osds, hence it can satisfy the criteria even if one of the hosts are down. This setting would be highly inadvisable on a production system!


Denes.

On 11/30/2017 02:45 PM, David Turner wrote:

active+clean+remapped is not a healthy state for a PG. If it actually we're going to a new osd it would say backfill+wait or backfilling and eventually would get back to active+clean.

I'm not certain what the active+clean+remapped state means. Perhaps a PG query, PG dump, etc can give more insight. In any case, this is not a healthy state and you're still testing removing a node to have less than you need to be healthy.


On Thu, Nov 30, 2017, 5:38 AM Jakub Jaszewski <[email protected] <mailto:[email protected]>> wrote:

    I've just did ceph upgrade jewel -> luminous and am facing the
    same case...

    # EC profile
    crush-failure-domain=host
    crush-root=default
    jerasure-per-chunk-alignment=false
    k=3
    m=2
    plugin=jerasure
    technique=reed_sol_van
    w=8

    5 hosts in the cluster and I run systemctl stop ceph.target on one
    of them
    some PGs from EC pool were remapped (active+clean+remappedstate)
    even when there was not enough hosts in the cluster but some are
    still in active+undersized+degradedstate


    root@host01:~# ceph status
      cluster:
        id: a6f73750-1972-47f6-bcf5-a99753be65ad
        health: HEALTH_WARN
                Degraded data redundancy: 876/9115 objects degraded
    (9.611%), 540 pgs unclean, 540 pgs degraded, 540 pgs undersized
      services:
        mon: 3 daemons, quorum host01,host02,host03
        mgr: host01(active), standbys: host02, host03
        osd: 60 osds: 48 up, 48 in; 484 remapped pgs
        rgw: 3 daemons active
      data:
        pools:   19 pools, 3736 pgs
        objects: 1965 objects, 306 MB
        usage:   5153 MB used, 174 TB / 174 TB avail
        pgs:     876/9115 objects degraded (9.611%)
                 2712 active+clean
                 540  active+undersized+degraded
                 484  active+clean+remapped
      io:
        client:   17331 B/s rd, 20 op/s rd, 0 op/s wr
    root@host01:~#



    Anyone here able to explain this behavior to me ?

    Jakub
    _______________________________________________
    ceph-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to