Hello, Any chance that these OSDs were deployed with different bluestore_min_alloc_size settings?
Josh On Mon, Jul 7, 2025 at 2:39 PM mhnx <morphinwith...@gmail.com> wrote: > > Hello Stefan! > > All of my nodes and clients = Octopus 15.2.14 > > I have 1x RBD pool and 2000x rbd volumes with 100Gb / each > > > This is upmap balanced state, without manual reweight: > > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP > META AVAIL %USE VAR PGS STATUS TYPE NAME > -1 669.87897 - 671 TiB 381 TiB 376 TiB 170 GiB > 5.2 TiB 289 TiB 56.87 1.00 - root default > -53 335.36298 - 335 TiB 192 TiB 189 TiB 85 GiB > 2.6 TiB 144 TiB 57.15 1.00 - datacenter > E-datacenter > > **** OLD-NODE: > -43 20.95900 - 21 TiB 11 TiB 10 TiB 5.4 GiB > 180 GiB 10 TiB 50.66 0.89 - host E10 > 240 ssd 1.74699 1.00000 1.7 TiB 728 GiB 714 GiB 425 MiB > 14 GiB 1.0 TiB 40.70 0.72 125 up osd.240 > 241 ssd 1.74699 1.00000 1.7 TiB 924 GiB 909 GiB 507 MiB > 14 GiB 864 GiB 51.66 0.91 126 up osd.241 > 242 ssd 1.74699 1.00000 1.7 TiB 913 GiB 898 GiB 513 MiB > 15 GiB 876 GiB 51.04 0.90 131 up osd.242 > 243 ssd 1.74699 1.00000 1.7 TiB 896 GiB 880 GiB 474 MiB > 16 GiB 892 GiB 50.12 0.88 132 up osd.243 > 244 ssd 1.74699 1.00000 1.7 TiB 842 GiB 826 GiB 411 MiB > 16 GiB 947 GiB 47.06 0.83 133 up osd.244 > 245 ssd 1.74699 1.00000 1.7 TiB 912 GiB 896 GiB 416 MiB > 15 GiB 876 GiB 51.00 0.90 143 up osd.245 > 246 ssd 1.74699 1.00000 1.7 TiB 940 GiB 925 GiB 535 MiB > 15 GiB 848 GiB 52.58 0.92 143 up osd.246 > 247 ssd 1.74699 1.00000 1.7 TiB 1008 GiB 993 GiB 436 MiB > 15 GiB 781 GiB 56.35 0.99 135 up osd.247 > 248 ssd 1.74699 1.00000 1.7 TiB 1.0 TiB 1.0 TiB 452 MiB > 15 GiB 728 GiB 59.28 1.04 141 up osd.248 > 249 ssd 1.74699 1.00000 1.7 TiB 826 GiB 812 GiB 375 MiB > 14 GiB 962 GiB 46.21 0.81 128 up osd.249 > 250 ssd 1.74699 1.00000 1.7 TiB 923 GiB 907 GiB 435 MiB > 15 GiB 866 GiB 51.60 0.91 136 up osd.250 > 251 ssd 1.74699 1.00000 1.7 TiB 900 GiB 884 GiB 567 MiB > 15 GiB 889 GiB 50.30 0.88 142 up osd.251 > > **** NEW-NODE: > -65 20.96375 - 21 TiB 16 TiB 16 TiB 5.4 GiB > 125 GiB 5.1 TiB 75.47 1.33 - host E14 > 324 ssd 1.74698 1.00000 1.7 TiB 1.4 TiB 1.3 TiB 431 MiB > 10 GiB 399 GiB 77.72 1.37 124 up osd.324 > 325 ssd 1.74698 1.00000 1.7 TiB 1.2 TiB 1.2 TiB 436 MiB > 9.6 GiB 579 GiB 67.62 1.19 107 up osd.325 > 326 ssd 1.74698 1.00000 1.7 TiB 1.3 TiB 1.3 TiB 446 MiB > 10 GiB 495 GiB 72.35 1.27 107 up osd.326 > 327 ssd 1.74698 1.00000 1.7 TiB 1.4 TiB 1.4 TiB 506 MiB > 11 GiB 355 GiB 80.14 1.41 126 up osd.327 > 328 ssd 1.74698 1.00000 1.7 TiB 1.3 TiB 1.3 TiB 432 MiB > 10 GiB 477 GiB 73.33 1.29 114 up osd.328 > 329 ssd 1.74698 1.00000 1.7 TiB 1.4 TiB 1.4 TiB 530 MiB > 11 GiB 343 GiB 80.81 1.42 124 up osd.329 > 330 ssd 1.74698 1.00000 1.7 TiB 1.2 TiB 1.2 TiB 432 MiB > 10 GiB 537 GiB 69.99 1.23 113 up osd.330 > 331 ssd 1.74698 1.00000 1.7 TiB 1.4 TiB 1.4 TiB 473 MiB > 11 GiB 353 GiB 80.25 1.41 123 up osd.331 > 332 ssd 1.74698 1.00000 1.7 TiB 1.4 TiB 1.4 TiB 459 MiB > 11 GiB 370 GiB 79.30 1.39 124 up osd.332 > 333 ssd 1.74698 1.00000 1.7 TiB 1.3 TiB 1.2 TiB 438 MiB > 10 GiB 500 GiB 72.05 1.27 111 up osd.333 > 334 ssd 1.74698 1.00000 1.7 TiB 1.4 TiB 1.4 TiB 433 MiB > 11 GiB 393 GiB 78.00 1.37 123 up osd.334 > 335 ssd 1.74698 1.00000 1.7 TiB 1.3 TiB 1.3 TiB 488 MiB > 10 GiB 464 GiB 74.08 1.30 119 up osd.335 > > --------------------- > I can't upgrade to newer versions because I have a personal project > and it is designed for current linux and ceph version. Upgrade means a > lot of work for me. > > Maybe the JJ balancer will do better job as you recommended but I > don't want better balance at this moment. > > First of all I want to understand why this happened and what is > changed between "nautilus <-> octopus" and same OSD deploy method > generates near-full new OSD's with similar amount PG count. > > -Best > > Stefan Kooman <ste...@bit.nl>, 7 Tem 2025 Pzt, 22:22 tarihinde şunu yazdı: > > > > On 7/7/25 18:34, mhnx wrote: > > > Hello! > > > > > > Few years ago I build a "dc-a:12 + dc-b:12 = 24" node ceph cluster > > > with Nautilus v14.2.16 > > > A year ago the cluster upgraded to Octopus and it was running fine. > > > Recently I added 4+4=8 new nodes with identical hardware and SSD drives. > > > When I created OSD's with Octopus, The cluster usage increased from %50 > > > to %78!! > > > > What does a "ceph osd df tree" gives you? > > > > > > > > The weird problem is, the new OSD's become nearfull and hold more size > > > even if they have the same or less amount of PG's. > > > > > > I had to reweight new OSD's to 0.9 to make them equal size usage.. > > > I increased the PG count 8192 to 16384 and ran balancer, it became > > > worse and I have %84 usage now! > > > > Remember that Ceph is limited by the fullest OSD in the cluster. > > Do you have old clients? If not, try to get rid of reweight and start > > using upmap. It is way more efficient in getting a cluster well > > balanced. I would recommend using this balance script: > > https://github.com/TheJJ/ceph-balancer > > > > Maybe first reset all the reweigths (first do: ceph osd set nobackfill). > > Then run this script: > > https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py > > > > And after that run the ceph-balancer script. That should help > > tremendously if the cluster is imbalanced. > > > > > > > > > > I guess OSD or PG code changed between nautilus <-> octopus and it > > > generates this problem. > > > > What version of Octopus are you running? > > > > > > > > Can anyone help me with experience or knowledge about this? > > > What should I do? > > > > > > My solution idea: > > > I'm thinking of destroy and re-create old OSD's as a solution but I > > > need to re-create 144x3.8TB Sas SSD OSD's and it means 4-5 days of > > > maintenance. > > > > > > Also I have 2 osd per drive because it was recommended at Nautilus > > > times. How about this? Should I keep the config or should I use 1 osd > > > per 3.8TB SAS SSD ? What is the recommendation for Octopus and Quincy? > > > > I would recommend upgrading to newer, supported versions, maybe go to > > Pacific and then Reef. Modern versions of Ceph do not gain from > > deploying multiple OSDs per drive. What Ceph services are you running > > (MDS, RGW, RBD)? > > > > Gr. Stefan > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io