Hello,

We’re seeing the MAX_AVAIL value in ceph df instantaneously drop to 0 Bytes / 
100% full when specific osds have their crush weight
set to low values.

The osds are otherwise healthy, and ceph osd df does not show their utilization 
to be above 70%.

ceph version 19.2.2

CLASS SIZE AVAIL USED RAW USED %RAW USED
mf1hdd 19 PiB 8.9 PiB 10 PiB 10 PiB 53.83

to

POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
mf1fs_data 1 16384 6.8 PiB 2.51G 10 PiB 100.00 0 B

We’re running a 9+3 EC pool.

This cluster has 1139 osds / 46 host.

We’re in the process of downsizing the cluster and draining nodes via crush 
reweight is part of our normal operations.

It happened once a few weeks ago, and we isolated the issue to the weight on a 
single osd. Now it’s happening during rebalance on multiple osds, at some point 
the movement of PGs triggers an edge case causing the MAX AVAIL calculation to 
fail if the crush weight is too low.

Example crush weights


1418    mf1hdd      0.02000          osd.1418             up   1.00000  1.00000

1419    mf1hdd      0.02000          osd.1419             up         0  1.00000

2110    mf1hdd      0.02000          osd.2110             up   1.00000  1.00000

2111    mf1hdd      0.02000          osd.2111             up         0  1.00000

2112  nvmemeta      0.02000          osd.2112             up   1.00000  1.00000


Any ideas before I file a bug report?

Thankyou
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to