Got it.  If you were using, say, a rack failure domain and had weighted down 
one or more racks such that there were no longer at least 9 racks with normal 
weight that might have been a factor.

The "max avail" figures are calculated based on the configured full ratio, 
relative to the single most-full OSD.  Is your balancer on?  Do you have any 
legacy reweight values that are < 1.000 ?  What does `ceph osd df | tail` show 
for std deviation?  Does `ceph osd df` show a wide spread in fullness such that 
some outlier OSD might be perturbing your results?

> On Aug 18, 2025, at 6:46 PM, Justin Mammarella 
> <justin.mammare...@unimelb.edu.au> wrote:
> 
> Our failure domain is host.
> We currently have 46 hosts, 6 of them have osds that are weighted down to 
> near 0.
> 
> And a correction to my original email, we are using EC 6 + 3
> 
> 
> From: Anthony D'Atri <anthony.da...@gmail.com>
> Date: Tuesday, 19 August 2025 at 2:14 am
> To: Justin Mammarella <justin.mammare...@unimelb.edu.au>
> Cc: Ceph Users <ceph-users@ceph.io>
> Subject: [EXT] Re: [ceph-users] MAX_AVAIL becomes 0 bytes when setting osd 
> crush weight to low value.
> External email: Please exercise caution
> 
> How many failure domains do you have? The downweighted hosts, are they spread 
> across failure domains?
> 
>> On Aug 18, 2025, at 10:28 AM, Justin Mammarella 
>> <justin.mammare...@unimelb.edu.au> wrote:
>> 
>> Hello,
>> 
>> We’re seeing the MAX_AVAIL value in ceph df instantaneously drop to 0 Bytes 
>> / 100% full when specific osds have their crush weight
>> set to low values.
>> 
>> The osds are otherwise healthy, and ceph osd df does not show their 
>> utilization to be above 70%.
>> 
>> ceph version 19.2.2
>> 
>> CLASS SIZE AVAIL USED RAW USED %RAW USED
>> mf1hdd 19 PiB 8.9 PiB 10 PiB 10 PiB 53.83
>> 
>> to
>> 
>> POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
>> mf1fs_data 1 16384 6.8 PiB 2.51G 10 PiB 100.00 0 B
>> 
>> We’re running a 9+3 EC pool.
>> 
>> This cluster has 1139 osds / 46 host.
>> 
>> We’re in the process of downsizing the cluster and draining nodes via crush 
>> reweight is part of our normal operations.
>> 
>> It happened once a few weeks ago, and we isolated the issue to the weight on 
>> a single osd. Now it’s happening during rebalance on multiple osds, at some 
>> point the movement of PGs triggers an edge case causing the MAX AVAIL 
>> calculation to fail if the crush weight is too low.
>> 
>> Example crush weights
>> 
>> 
>> 1418    mf1hdd      0.02000          osd.1418             up   1.00000  
>> 1.00000
>> 
>> 1419    mf1hdd      0.02000          osd.1419             up         0  
>> 1.00000
>> 
>> 2110    mf1hdd      0.02000          osd.2110             up   1.00000  
>> 1.00000
>> 
>> 2111    mf1hdd      0.02000          osd.2111             up         0  
>> 1.00000
>> 
>> 2112  nvmemeta      0.02000          osd.2112             up   1.00000  
>> 1.00000
>> 
>> 
>> Any ideas before I file a bug report?
>> 
>> Thankyou
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to