[ceph-users] HELP! Cluster usage increased after adding new nodes/osd's

mhnx Mon, 07 Jul 2025 09:34:40 -0700

Hello!

Few years ago I build a "dc-a:12 + dc-b:12 = 24" node ceph cluster
with Nautilus v14.2.16
A year ago the cluster upgraded to Octopus and it was running fine.
Recently I added 4+4=8 new nodes with identical hardware and SSD drives.
When I created OSD's with Octopus, The cluster usage increased from %50 to %78!!


The weird problem is, the new OSD's become nearfull and hold more size
even if they have the same or less amount of PG's.

I had to reweight new OSD's to 0.9 to make them equal size usage..
I increased the PG count 8192 to 16384 and ran balancer, it became
worse and I have %84 usage now!

I guess OSD or PG code changed between nautilus <-> octopus and it
generates this problem.

Can anyone help me with experience or knowledge about this?
What should I do?

My solution idea:
I'm thinking of destroy and re-create old OSD's as a solution but I
need to re-create 144x3.8TB Sas SSD OSD's and it means 4-5 days of
maintenance.

Also I have 2 osd per drive because it was recommended at Nautilus
times. How about this? Should I keep the config or should I use 1 osd
per 3.8TB SAS SSD ? What is the recommendation for Octopus and Quincy?

- Best regards.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] HELP! Cluster usage increased after adding new nodes/osd's

Reply via email to