As per your ceph status it seems that you have 19 pools, all of them are
erasure coded as 3+2?
It seems that when you taken the node offline ceph could move some of
the PGs to other nodes (it seems that that one or more pools does not
require all 5 osds to be healty. Maybe they are replicated, or not 3+2
erasure coded?)
Theese pgs are the active+clean+remapped. (Ceph could successfully put
theese on other osds to maintain the replica count / erasure coding
profile, and this remapping process completed)
Some other pgs do seem to require all 5 osds to be present, these are
the "undersized" ones.
One other thing, if your failure domain is osd and not host or a larger
unit, then Ceph will not try to place all replicas on different servers,
just different osds, hence it can satisfy the criteria even if one of
the hosts are down. This setting would be highly inadvisable on a
production system!
Denes.
On 11/30/2017 02:45 PM, David Turner wrote:
active+clean+remapped is not a healthy state for a PG. If it actually
we're going to a new osd it would say backfill+wait or backfilling and
eventually would get back to active+clean.
I'm not certain what the active+clean+remapped state means. Perhaps a
PG query, PG dump, etc can give more insight. In any case, this is not
a healthy state and you're still testing removing a node to have less
than you need to be healthy.
On Thu, Nov 30, 2017, 5:38 AM Jakub Jaszewski
<[email protected] <mailto:[email protected]>> wrote:
I've just did ceph upgrade jewel -> luminous and am facing the
same case...
# EC profile
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=3
m=2
plugin=jerasure
technique=reed_sol_van
w=8
5 hosts in the cluster and I run systemctl stop ceph.target on one
of them
some PGs from EC pool were remapped (active+clean+remappedstate)
even when there was not enough hosts in the cluster but some are
still in active+undersized+degradedstate
root@host01:~# ceph status
cluster:
id: a6f73750-1972-47f6-bcf5-a99753be65ad
health: HEALTH_WARN
Degraded data redundancy: 876/9115 objects degraded
(9.611%), 540 pgs unclean, 540 pgs degraded, 540 pgs undersized
services:
mon: 3 daemons, quorum host01,host02,host03
mgr: host01(active), standbys: host02, host03
osd: 60 osds: 48 up, 48 in; 484 remapped pgs
rgw: 3 daemons active
data:
pools: 19 pools, 3736 pgs
objects: 1965 objects, 306 MB
usage: 5153 MB used, 174 TB / 174 TB avail
pgs: 876/9115 objects degraded (9.611%)
2712 active+clean
540 active+undersized+degraded
484 active+clean+remapped
io:
client: 17331 B/s rd, 20 op/s rd, 0 op/s wr
root@host01:~#
Anyone here able to explain this behavior to me ?
Jakub
_______________________________________________
ceph-users mailing list
[email protected] <mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com