Hi there!
Recently I made our cluster rack aware
by adding racks to the crush map.
The failure domain was and still is "host".
rule cephfs2_data {
id 7
type erasure
min_size 3
max_size 6
step set_chooseleaf_tries 5
step set_choose_tries 100
step take PRZ
step chooseleaf indep 0 type host
step emit
Then I sorted the hosts into the new
rack buckets of the crush map as they
are in reality, by:
# osd crush move onodeX rack=XYZ
for all hosts.
The cluster started to reorder the data.
In the end the cluster has now:
HEALTH_WARN 1 filesystem is degraded; Reduced data availability: 2 pgs
inactive; Degraded data redundancy: 678/2371785 objects degraded (0.029%), 2
pgs degraded, 2 pgs undersized
FS_DEGRADED 1 filesystem is degraded
fs cephfs_1 is degraded
PG_AVAILABILITY Reduced data availability: 2 pgs inactive
pg 21.2e4 is stuck inactive for 142792.952697, current state
activating+undersized+degraded+remapped+forced_backfill, last acting
[5,2147483647,25,28,11,2]
pg 23.5 is stuck inactive for 142791.437243, current state
activating+undersized+degraded+remapped+forced_backfill, last acting [13,21]
PG_DEGRADED Degraded data redundancy: 678/2371785 objects degraded (0.029%), 2
pgs degraded, 2 pgs undersized
pg 21.2e4 is stuck undersized for 142779.321192, current state
activating+undersized+degraded+remapped+forced_backfill, last acting
[5,2147483647,25,28,11,2]
pg 23.5 is stuck undersized for 142789.747915, current state
activating+undersized+degraded+remapped+forced_backfill, last acting [13,21]
The cluster hosts a cephfs which is
not mountable anymore.
I tried a few things (as you can see:
forced_backfill), but failed.
The cephfs_data pool is EC 4+2.
Both inactive pgs seem to have enough
copies to recalculate the contents for
all osds.
Is there a chance to get both pgs
clean again?
How can I force the pgs to recalculate
all necessary copies?
Thanks
Lars
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com