Hi Adam,

please see my comments inline,


Thanks,

Igor

On 05.05.2025 18:17, Adam Prycki wrote:
Hi Igor,

It looks like you're right.

ceph tell osd.1750 status
{
    "cluster_fsid": "bec60cda-a306-11ed-abd9-75488d4e8f4a",
    "osd_fsid": "388906f8-1df8-45a2-9895-067ee2e0c055",
    "whoami": 1750,
    "state": "active",
    "maps": "[1502335~261370]",
    "oldest_map": "1502335",
    "newest_map": "1763704",
    "cluster_osdmap_trim_lower_bound": 1502335,
    "num_pgs": 0
}

One of the OSDs I've checked has about 261K maps
Could this cause bluestore to grow to ~380GiB?

You better check objects size/count in meta pool using ceph-objectstore-tool and estimate the total numbers:

ceph-objectstore-tool --data-path <path-to-osd> --op meta-list > meta_list; cat meta_list | wc

ceph-objectstore-tool --data-path <path-to-osd> --pgid meta <oid> dump | grep size

The latter command can obtain onode size for a given object - just use a few oids corresponding to specific onode types (osdmap* and inc_osdmap* are of particular interest, other types to be checked if they are in bulky counts) from meta_list file.


What settings could affect number of mapps stored on OSD.
Only think that comes to mind is mon_min_osdmap_epochs which I configured to 2000 a while ago.

osdmap-s should be trimmed automatically in a healthy cluster. Perhaps an ongoing rebalancing prevents or some other issue from that. The first question would be how osdmap epochs evolve? Is oldest_map increasing? Is the delta decreasing?


We are also running long rebalancing on this cluster. At the beginning of the year we've expended this cluster and we are still waiting for HDD pool to finish rebalancing. We have only 2% of data left to rebalance. It's taking so long because I wanted to stay with mclock scheduler at first. Rebalancing shouldn't affect SSD pool directly, but it keeps hdd pools from being 100% active+clean

Evacuating all PGs from OSD doesn't lower it's usage.
For example
1814    ssd        0   1.00000  447 GiB  382 GiB  381 GiB  351 MiB   1.0 GiB  65 GiB  85.44  1.00    0      up
I assume it's bluestore and not PG/Crush related issue.
It's the issue with osdmap not being properly trimmed. It's unrelated to BlueStore.

Best regards
Adam

W dniu 5.05.2025 o 16:20, Igor Fedotov pisze:
Hi Adam,

I recall a case where pretty high OSD utilization was caused by no trimming for osdmaps.

You might want to check the output of 'ceph tell osd.N status' command and compare oldest_map vs. newest_map values.

Having a delta in hundreds of thousands of epochs would confirm the case.


Thanks,

Igor

On 05.05.2025 16:41, Adam Prycki wrote:
Sorry guys, I've forgotten to add attachments to my previous email.

In attachments:
ceph osd crush rule dump
ceph osd df tree

Best regards
Adam Prycki


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to