Hello,

Any chance that these OSDs were deployed with different
bluestore_min_alloc_size settings?

Josh

On Mon, Jul 7, 2025 at 2:39 PM mhnx <morphinwith...@gmail.com> wrote:
>
> Hello Stefan!
>
> All of my nodes and clients = Octopus 15.2.14
>
> I have 1x RBD pool and 2000x rbd volumes with 100Gb / each
>
>
> This is upmap balanced state, without manual reweight:
>
> ID   CLASS  WEIGHT     REWEIGHT  SIZE     RAW USE   DATA      OMAP
> META     AVAIL     %USE   VAR   PGS  STATUS  TYPE NAME
>  -1         669.87897         -  671 TiB   381 TiB   376 TiB  170 GiB
> 5.2 TiB   289 TiB  56.87  1.00    -          root default
> -53         335.36298         -  335 TiB   192 TiB   189 TiB   85 GiB
> 2.6 TiB   144 TiB  57.15  1.00    -              datacenter
> E-datacenter
>
>  **** OLD-NODE:
> -43          20.95900         -   21 TiB    11 TiB    10 TiB  5.4 GiB
> 180 GiB    10 TiB  50.66  0.89    -                  host E10
> 240    ssd    1.74699   1.00000  1.7 TiB   728 GiB   714 GiB  425 MiB
>  14 GiB   1.0 TiB  40.70  0.72  125      up              osd.240
> 241    ssd    1.74699   1.00000  1.7 TiB   924 GiB   909 GiB  507 MiB
>  14 GiB   864 GiB  51.66  0.91  126      up              osd.241
> 242    ssd    1.74699   1.00000  1.7 TiB   913 GiB   898 GiB  513 MiB
>  15 GiB   876 GiB  51.04  0.90  131      up              osd.242
> 243    ssd    1.74699   1.00000  1.7 TiB   896 GiB   880 GiB  474 MiB
>  16 GiB   892 GiB  50.12  0.88  132      up              osd.243
> 244    ssd    1.74699   1.00000  1.7 TiB   842 GiB   826 GiB  411 MiB
>  16 GiB   947 GiB  47.06  0.83  133      up              osd.244
> 245    ssd    1.74699   1.00000  1.7 TiB   912 GiB   896 GiB  416 MiB
>  15 GiB   876 GiB  51.00  0.90  143      up              osd.245
> 246    ssd    1.74699   1.00000  1.7 TiB   940 GiB   925 GiB  535 MiB
>  15 GiB   848 GiB  52.58  0.92  143      up              osd.246
> 247    ssd    1.74699   1.00000  1.7 TiB  1008 GiB   993 GiB  436 MiB
>  15 GiB   781 GiB  56.35  0.99  135      up              osd.247
> 248    ssd    1.74699   1.00000  1.7 TiB   1.0 TiB   1.0 TiB  452 MiB
>  15 GiB   728 GiB  59.28  1.04  141      up              osd.248
> 249    ssd    1.74699   1.00000  1.7 TiB   826 GiB   812 GiB  375 MiB
>  14 GiB   962 GiB  46.21  0.81  128      up              osd.249
> 250    ssd    1.74699   1.00000  1.7 TiB   923 GiB   907 GiB  435 MiB
>  15 GiB   866 GiB  51.60  0.91  136      up              osd.250
> 251    ssd    1.74699   1.00000  1.7 TiB   900 GiB   884 GiB  567 MiB
>  15 GiB   889 GiB  50.30  0.88  142      up              osd.251
>
> **** NEW-NODE:
> -65          20.96375         -   21 TiB    16 TiB    16 TiB  5.4 GiB
> 125 GiB   5.1 TiB  75.47  1.33    -                  host E14
> 324    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.3 TiB  431 MiB
>  10 GiB   399 GiB  77.72  1.37  124      up              osd.324
> 325    ssd    1.74698   1.00000  1.7 TiB   1.2 TiB   1.2 TiB  436 MiB
> 9.6 GiB   579 GiB  67.62  1.19  107      up              osd.325
> 326    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.3 TiB  446 MiB
>  10 GiB   495 GiB  72.35  1.27  107      up              osd.326
> 327    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  506 MiB
>  11 GiB   355 GiB  80.14  1.41  126      up              osd.327
> 328    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.3 TiB  432 MiB
>  10 GiB   477 GiB  73.33  1.29  114      up              osd.328
> 329    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  530 MiB
>  11 GiB   343 GiB  80.81  1.42  124      up              osd.329
> 330    ssd    1.74698   1.00000  1.7 TiB   1.2 TiB   1.2 TiB  432 MiB
>  10 GiB   537 GiB  69.99  1.23  113      up              osd.330
> 331    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  473 MiB
>  11 GiB   353 GiB  80.25  1.41  123      up              osd.331
> 332    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  459 MiB
>  11 GiB   370 GiB  79.30  1.39  124      up              osd.332
> 333    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.2 TiB  438 MiB
>  10 GiB   500 GiB  72.05  1.27  111      up              osd.333
> 334    ssd    1.74698   1.00000  1.7 TiB   1.4 TiB   1.4 TiB  433 MiB
>  11 GiB   393 GiB  78.00  1.37  123      up              osd.334
> 335    ssd    1.74698   1.00000  1.7 TiB   1.3 TiB   1.3 TiB  488 MiB
>  10 GiB   464 GiB  74.08  1.30  119      up              osd.335
>
> ---------------------
> I can't upgrade to newer versions because I have a personal project
> and it is designed for current linux and ceph version. Upgrade means a
> lot of work for me.
>
> Maybe the JJ balancer will do better job as you recommended but I
> don't want better balance at this moment.
>
> First of all I want to understand why this happened and what is
> changed between "nautilus <-> octopus" and same OSD deploy method
> generates near-full new OSD's with similar amount PG count.
>
> -Best
>
> Stefan Kooman <ste...@bit.nl>, 7 Tem 2025 Pzt, 22:22 tarihinde şunu yazdı:
> >
> > On 7/7/25 18:34, mhnx wrote:
> > > Hello!
> > >
> > > Few years ago I build a "dc-a:12 + dc-b:12 = 24" node ceph cluster
> > > with Nautilus v14.2.16
> > > A year ago the cluster upgraded to Octopus and it was running fine.
> > > Recently I added 4+4=8 new nodes with identical hardware and SSD drives.
> > > When I created OSD's with Octopus, The cluster usage increased from %50 
> > > to %78!!
> >
> > What does a "ceph osd df tree" gives you?
> >
> > >
> > > The weird problem is, the new OSD's become nearfull and hold more size
> > > even if they have the same or less amount of PG's.
> > >
> > > I had to reweight new OSD's to 0.9 to make them equal size usage..
> > > I increased the PG count 8192 to 16384 and ran balancer, it became
> > > worse and I have %84 usage now!
> >
> > Remember that Ceph is limited by the fullest OSD in the cluster.
> > Do you have old clients? If not, try to get rid of reweight and start
> > using upmap. It is way more efficient in getting a cluster well
> > balanced. I would recommend using this balance script:
> > https://github.com/TheJJ/ceph-balancer
> >
> > Maybe first reset all the reweigths (first do: ceph osd set nobackfill).
> > Then run this script:
> > https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
> >
> > And after that run the ceph-balancer script. That should help
> > tremendously if the cluster is imbalanced.
> >
> >
> > >
> > > I guess OSD or PG code changed between nautilus <-> octopus and it
> > > generates this problem.
> >
> > What version of Octopus are you running?
> >
> > >
> > > Can anyone help me with experience or knowledge about this?
> > > What should I do?
> > >
> > > My solution idea:
> > > I'm thinking of destroy and re-create old OSD's as a solution but I
> > > need to re-create 144x3.8TB Sas SSD OSD's and it means 4-5 days of
> > > maintenance.
> > >
> > > Also I have 2 osd per drive because it was recommended at Nautilus
> > > times. How about this? Should I keep the config or should I use 1 osd
> > > per 3.8TB SAS SSD ? What is the recommendation for Octopus and Quincy?
> >
> > I would recommend upgrading to newer, supported versions, maybe go to
> > Pacific and then Reef. Modern versions of Ceph do not gain from
> > deploying multiple OSDs per drive. What Ceph services are you running
> > (MDS, RGW, RBD)?
> >
> > Gr. Stefan
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to