[ceph-users] Re: Full OSD with almost empty pools

Igor Fedotov Tue, 06 May 2025 04:35:42 -0700

Hi Adam,

please see my comments inline,



Thanks,

Igor

On 05.05.2025 18:17, Adam Prycki wrote:

Hi Igor,

It looks like you're right.

ceph tell osd.1750 status
{
    "cluster_fsid": "bec60cda-a306-11ed-abd9-75488d4e8f4a",
    "osd_fsid": "388906f8-1df8-45a2-9895-067ee2e0c055",
    "whoami": 1750,
    "state": "active",
    "maps": "[1502335~261370]",
    "oldest_map": "1502335",
    "newest_map": "1763704",
    "cluster_osdmap_trim_lower_bound": 1502335,
    "num_pgs": 0
}

One of the OSDs I've checked has about 261K maps
Could this cause bluestore to grow to ~380GiB?

You better check objects size/count in meta pool usingceph-objectstore-tool and estimate the total numbers:

ceph-objectstore-tool --data-path <path-to-osd> --op meta-list >meta_list; cat meta_list | wc

ceph-objectstore-tool --data-path <path-to-osd> --pgid meta <oid> dump |grep size

The latter command can obtain onode size for a given object - just use afew oids corresponding to specific onode types (osdmap* and inc_osdmap*are of particular interest, other types to be checked if they are inbulky counts) from meta_list file.

What settings could affect number of mapps stored on OSD.
Only think that comes to mind is mon_min_osdmap_epochs which Iconfigured to 2000 a while ago.

osdmap-s should be trimmed automatically in a healthy cluster. Perhapsan ongoing rebalancing prevents or some other issue from that. The firstquestion would be how osdmap epochs evolve? Is oldest_map increasing? Isthe delta decreasing?

We are also running long rebalancing on this cluster. At the beginningof the year we've expended this cluster and we are still waiting forHDD pool to finish rebalancing. We have only 2% of data left torebalance. It's taking so long because I wanted to stay with mclockscheduler at first.Rebalancing shouldn't affect SSD pool directly, but it keeps hdd poolsfrom being 100% active+clean
Evacuating all PGs from OSD doesn't lower it's usage.
For example
1814 ssd 0 1.00000 447 GiB 382 GiB 381 GiB 351 MiB 1.0 GiB 65 GiB 85.44 1.00 0 up
I assume it's bluestore and not PG/Crush related issue.

It's the issue with osdmap not being properly trimmed. It's unrelated toBlueStore.


Best regards
Adam

W dniu 5.05.2025 o 16:20, Igor Fedotov pisze:

Hi Adam,
I recall a case where pretty high OSD utilization was caused by notrimming for osdmaps.
You might want to check the output of 'ceph tell osd.N status'command and compare oldest_map vs. newest_map values.
Having a delta in hundreds of thousands of epochs would confirm thecase.
Thanks,

Igor

On 05.05.2025 16:41, Adam Prycki wrote:
Sorry guys, I've forgotten to add attachments to my previous email.

In attachments:
ceph osd crush rule dump
ceph osd df tree

Best regards
Adam Prycki


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Full OSD with almost empty pools

Reply via email to