Hello I've attempted to increase the number of placement groups of the pools in our test cluster and now ceph status (below) is reporting problems. I am not sure what is going on or how to fix this. Troubleshooting scenarios in the docs don't seem to quite match what I am seeing.
I have no idea how to begin to debug this. I see OSDs listed in
"blocked_by" of pg dump, but don't know how to interpret that. Could
somebody assist please?
I attached output of "ceph pg dump_stuck -f json-pretty" just in case.
The cluster consists of 5 hosts, each with 16 HDDs and 4 SSDs. I am
running 13.2.2.
This is the affected pool:
pool 6 'fs-data-ec-ssd' erasure size 5 min_size 4 crush_rule 6
object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 2493 lfor
0/2491 flags hashpspool,ec_overwrites stripe_width 12288 application cephfs
Thanks,
Vlad
ceph health
cluster:
id: 47caa1df-42be-444d-b603-02cad2a7fdd3
health: HEALTH_WARN
Reduced data availability: 155 pgs inactive, 47 pgs peering,
64 pgs stale
Degraded data redundancy: 321039/114913606 objects degraded
(0.279%), 108 pgs degraded, 108 pgs undersized
services:
mon: 5 daemons, quorum ceph-1,ceph-2,ceph-3,ceph-4,ceph-5
mgr: ceph-3(active), standbys: ceph-2, ceph-5, ceph-1, ceph-4
mds: cephfs-1/1/1 up {0=ceph-5=up:active}, 4 up:standby
osd: 100 osds: 100 up, 100 in; 165 remapped pgs
data:
pools: 6 pools, 5120 pgs
objects: 22.98 M objects, 88 TiB
usage: 154 TiB used, 574 TiB / 727 TiB avail
pgs: 3.027% pgs not active
321039/114913606 objects degraded (0.279%)
4903 active+clean
105 activating+undersized+degraded+remapped
61 stale+active+clean
47 remapped+peering
3 stale+activating+undersized+degraded+remapped
1 active+clean+scrubbing+deep
stuck.json.gz
Description: application/gzip
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
